New Apple Speech Patents May Increase Security, Ease of Use

T he United States Patent and Trademark Office (USPTO) recently issued Apple two patents relating to speech. Apple filed a patent application for a "Combined dual spectral and temporal alignment method for user authentication by voice" on September 29, 2000, which was issued by the USPTO as US Patent No. 6,697,779 on February 24, 2004. The abstract describes the invention as:

A method and system for training a user authentication by voice signal? If the speaker-specific comparison unit is within a threshold limit, then the voice signal is authenticated. In one embodiment, if both thresholds are satisfied, then the user is authenticated.

The background nicely lays out some of the issues regarding voice authentication:

Speaker authentication methods may be divided into text-dependent and text-independent methods. Text-dependent methods require the speaker to say key phrases having the same text for both training and recognition trials, whereas text-independent methods do not rely on a specific text to be spoken. Text-dependent systems offer the possibility of verifying the spoken key phrase (assuming it is kept secret) in addition to the speaker identity, thus resulting in an additional layer of security. This is referred to as the dual verification of speaker and verbal content, which is predicated on the user maintaining the confidentiality of his or her pass-phrase.

On the other hand, text-independent systems offer the possibility of prompting each speaker with a new key phrase every time the system is used. This provides essentially the same level of security as a secret pass-phrase without burdening the user with the responsibility to safeguarding and remembering the pass-phrase. This is because prospective impostors cannot know in advance what random sentence will be requested and therefore cannot (easily) play back some illegally pre-recorded voice samples from a legitimate user.

Apple seems to have come up with some very nice voice authentication technology here. Itis designed to spoil those that would just tape someoneis voice, and is also language independent:

[S]ophisticated impostors, who might be very skilled at mimicking spectral content (using, for example, illegally recorded material from the speaker they want to impersonate) [can cause problems].

To address this?verbal content verification is employed to provide an additional layer of security?Again, this obviates the need for a phoneme set, which means verbal content verification may also be done on a language-independent basis.

One can imagine Appleis system allowing users to read a randomly generated pass phrase that would unlock your encrypted user account and Keychain. It would be tied to your unique voice print and obviate the need for passwords. Hallelujah. It cannot come soon enough. Should it be integrated into OS X 10.4, it would both enhance security and ease of use for Mac users.

Next, Apple filed a patent application for "Assigning meanings to utterances in a speech recognition system" on October 12, 2001, which was issued by the United States Patent and Trademark Office (USPTO) as US Patent No. 6,704,710 on March 9, 2004. The summary notes that the invention:

[Provides] a means for associating meanings with spoken utterances in a speech recognition system?[and provides] an improved method for associating expressions (e.g. actions and variable values) to speech rules in a speech recognition system.

What is interesting about this invention is it does more than merely attempt to convert speech to text, a daunting task unto itself, but it actually tries to derive meaning from speech. The invention seems to parse out which words might be commands and which words may be variables for the commands. The idea is to discern commands and execute them for the user, i.e., a spoken user interface. The summary further states:

Upon the detection of speech in the speech recognition system, a current language model is generated from each language model in the speech rules for use by a recognizer?Each expression associated with the language model in each of the set of speech rules is evaluated, and actions performed in the system according to the expressions associated with each language model in the set of speech rules?Thus, actions such as variable assignments and commands may be performed according to these speech rules.

Although the above seems impressive at first blush, itis likely based on ages old and stale technology. This patent is a continuation of a patent application that was filed on December 31, 1992.