Today, in the latest instalment of Apple’s Machine Learning Journal. The Siri team write about some of the processes which makes voice-activated ‘Hey Siri’ work with the users’ voice. Apple in the past has talked about part of the process of making voice-activated Siri work in general. The ‘Wake-Word’, “Hey Siri” was chosen primarily because some users were naturally saying that phrase. When Siri was activated via the home button.
The phrase “Hey Siri” was originally chosen to be as natural as possible; in fact, it was so natural that even before this feature was introduced, users would invoke Siri using the home button and inadvertently prepend their requests with the words, “Hey Siri.”
When working on the ‘Hey Siri’ voice recognition command, Apple hit a few issues. These were; the main user saying a similar phrase to Hey Siri, another user saying Hey Siri, or another user saying a similar phrase to Hey Siri. Which is why Apple chose only to allow ‘Hey Siri’ to respond to only voice.
We measure the performance of a speaker recognition system as a combination of an Imposter Accept (IA) rate and a False Reject (FR) rate. It is important, however, to distinguish (and equate) these values from those used to measure the quality of a key-phrase trigger system.
Also in the Journal, Apple talked about how using ‘Hey Siri’ in a large room or noisy environment.
One of our current research efforts is focused on understanding and quantifying the degradation in these difficult conditions in which the environment of an incoming test utterance is a severe mismatch from the existing utterances in a user’s speaker profile.
Apple’s Machine Learning Journal offers a great insight. Especially for developers as they get to see how Apple builds some of its software, in this case, machine learning.