I was reading through Apple’s newest Machine Learning Journal* entry and found this to be very interesting: users said “Hey Siri” to interact with Siri long before “Hey Siri” voice activation was even a thing. According to Apple, users would start off their Siri interactions with “Hey Siri” even when the only way to access the service was by using the iPhone’s Home button.
From the Machine Learning Journal entry:
The phrase ‘Hey Siri’ was originally chosen to be as natural as possible; in fact, it was so natural that even before this feature was introduced, users would invoke Siri using the home button and inadvertently prepend their requests with the words, ‘Hey Siri.’
That wasn’t the point of this Apple post, but it speaks to our innate desire to interact with voice assistants as if they are real people. “Hey Siri” as a voice invocation hadn’t even been introduced, and yet that’s how many of us chose to start our conversations with her.
The real point of the AI team’s blog post (i.e. journal entry) is to talk about Speaker Recognition, the field of AI dealing with recognizing who is speaking. As Apple put it:
The overall goal of speaker recognition (SR) is to ascertain the identity of a person using his or her voice. We are interested in ‘who is speaking,’ as opposed to the problem of speech recognition, which aims to ascertain ‘what was spoken.’
This is an area where Apple’s voice assistant competitors (Amazon, Google to name two) have an edge on Apple, and this post offers insight on how Apple is working on the problem.
The blog post explains how Apple identified False Accepts (Siri responding when no one asked for her), False Rejects (Siri not responding when invoked), and Imposter Accepts (Siri responding when someone other than the device owner said “Hey Siri”). All of these instances have different challenges, and addressing any one of them affects performance in the other two areas.
Techity Gobbledy Blah
There’s a whole bunch of describing how Apple is working on Speaker Recognition, most of which is aimed at academics in the field. The company also said, however, that it’s working on improving Hey Siri performance in noisy environments such as large rooms, in the wind, and in moving vehicles.
*As you do.