The Thinking Behind a Female Voice for Siri

Apple's Siri is a strong contender

Siri started out with a female voice exclusively, but now it can be changed to male. Alexa uses only a female voice. Cortana’s voice, for now, is strictly female. Why is that? Is it sexism? Is it for better intelligibility? Is there a compelling psychology behind the female voice for all users? I looked into it.

Apple's Siri is a strong contender
Siri started out with a female voice only. The competition is still there.

I got very curious about this choice of AI gender when I read Joanna Stern’s article at the Wall Street Journal: “Alexa, Siri, Cortana: The Problem With All-Female Digital Assistants.

My first blush reaction was that it’s an intelligibility issue. I recalled previous studies, when I was younger, abut cockpit warning systems in military and commercial aircraft. The earliest, apocryphal reports were that pilots were able to better hear, against background noise, warnings in a female voice due to the frequency range. This also applies, I have heard, to radio dispatchers for the police. You can read more about the evolution of thinking and the current usage in the “Bitching Betty” article at Wikipedia. Notable, however, is this:

Arrabito in 2009, however, at Defence Research and Development Canada in Toronto, found that with simulated cockpit background radio traffic, a male voice rather than a female voice, in a monotone or urgent annunciaton style, resulted in the largest proportion of correct and fastest identification response times to verbal warnings, regardless of the gender of the listener.

I kept researching. Over the years, more complete studies have shown that the female voice isn’t necessarily easier to understand. A good summary of the technical sitation is provided by Sarah Zhang at Gizmodo. “No, Women’s Voices Are Not Easier to Understand Than Men’s Voices.” Of note:

More recently, though, a 1998 study at the Wright-Patterson Air Force Base in Ohio found the opposite: It’s actually female voices that are less intelligible against the noise inside cockpits, though the difference was tiny and only statistically significant at the highest levels of noise.

So What’s the Reason?

So, if it isn’t an intelligibility issue, why have Amazon, Apple and Microsoft made the choice for a female voice? Back to author Stern’s article.

First, the question is asked, Why Do Robots Need a Gender?” It turns out that genderless voice is very hard to achieve. So then, author Stern asks, “Why Female Then?” It turns out that market research, by Amazon and Microsoft, showed that both men and women had a stronger preference for a AI using a female voice.

The above notwithstanding, the author argues that we, as users, should have more control over the sex of the AI voice, as Apple has provided. Not doing so, despite market analysis, just perpetuates stereotypes.

My own thinking here is that AIs, as they become more and more powerful and persuasive, will take on an air of authority that will be hard to resist and could lead consumers into unintended actions. Or actions that favor the vendor over the customer. Wrapped in the aura of the male voice, there could be equally damaging social consequences, such as mass, hysterical rejection of technology.

For now, I think the instincts of the above companies are generally good. If nothing else, social pressure on the developers of these AI agnets will force them to maintain the supportive and warmer voice of the female in preference to the commanding voice of a male for potentially self-serving purposes.

This is a difficult subject. But at least we know that neither intelligibility concerns nor blatant geek, corporate sexism is driving the decision making. In the end, however, author Stern is right. Forget market research that leads to stereotyping. Like Apple, give us a choice.

Choice is good.

Next page: The news debris for the week of February 20th. One Amazon to crush them all.

8 thoughts on “The Thinking Behind a Female Voice for Siri

  • Back in the early 90’s I worked with researchers doing both voice recognition and voice generation. They maintained that female voices were easier to generate and male voices were easier to recognize. It was something to do with dynamic range and modulation, i.e. how a voice varied in loudness and frequency. As these were people actually trying to generate and recognize voices, I assume they knew what they were talking about.

    That being said, that was quite a while ago, and new techniques come along fairly frequently. Still, simple ease of generation explains why the voices are female.

  • John:

    I just commented on AI on Jeff Gamet’s Amazon/Echo article from 24 February, as I had on your PD article from last week, and more forcefully articulated the challenge that Apple faces on enhancing AI performance whilst protecting user privacy, and have no doubt that Apple are aggressively working on meeting this challenge not simply from an engineering perspective but by hiring some of the best minds in the academic community, which also means embracing academic standards of peer-review. I think you provide a great service to the community in continuing to draw attention to this discussion.

    That said, I am currently reading Yuri Harari’s Homo Deus, his treatise on a Post Human World, which you cite above. It’s always a risky proposition to comment on an argument before having fully read it through, but just a couple of global impressions, given the timing of your piece.

    As in any organised argument, Harari’s thesis begins with an hypothesis and a set of assumptions, including that for millennia, human beings had three things they faced in daily life; famine, pestilence (epidemics and pandemics) and war, and that these three forces shaped our valuation of the individual (the most robust societies required a healthy population pool to address these crises) and now that these forces are in decline, governments may place less value on the individual and their well-being. One of the apparent underlying assumptions in this that a core determinant of individuality is not a rational soul or mind, however defined, but a set of biological indeed cellular algorithms that drive our actions and that we interpret as individual identity, including aspirations and desires, which we are now in process of ceding to computer algorithms in the form AI that will eventually decide for us, and that in time, human individuality and identify as we know it will pass away.

    Not to rain on this dystopic parade, but we have heard this argument before, in fact countless times throughout the ages in the treatises of philosophers, religionists, politicians, scientists, and madmen howling naked through the streets, each from a different reference frame to be sure, but the crux of each argument was the surrendering human volition to something fundamentally non-human at least in terms of its capacity to empathise with the individual and protect individuality, morality, and the collective welfare of the species. Don’t get me wrong, I am enjoying the read and his argument, but to accept it en toto requires accepting his starting assumptions as truth, as do most philosophies (read Karl Marx’s opening lines in Das Capital about dispensing with distractions, after which everything else he argues will be proved true, as an example), however starting assumptions are always important, and sometimes require an act of faith, and not reason.

    Leaving aside that, given the modest human lifespans over the past three millennia, most individual humans would not not have encountered any of these three forces, let alone all three of them, let me focus my objections (not enough space here for a rebuttal) to Harari’s argument on their relevance to Apple tech, specifically AI and smart devices.

    Briefly, it undervalues and underestimates the power of individual identity and the human volition that has propelled the species from wretched hominid to unlocking the secrets of nature and subserving them to human betterment in civilisation (energy, transportation, food security to name just three). This volition, and its companion, aspiration for a better future, is a powerful force at work in the human species, operating at both the individual and collective level, such that ceding it to any entity, be it a totalitarian or technological, flies in the face of every major advance that we have made and the slow but relentless trajectory we have made as a species to expanding rights that protect and enhance the freedom to exercise that volition in the pursuit of a better future. Thus far, creatives developers have found ways to exploit tech in ways that decreases drudgery and expands productivity and well-being, however defined. His argument that using GPS cedes control to AI over decisions ignores that, by assigning the menial task of which exit to take to our technology, we can focus on the more important task of alert, safe driving, which in time will be supplanted by letting the tech do the driving while we pursue more uniquely human tasks, like preparing for a meeting en route or getting some needed sleep – either making us more productive.

    In brief, the idea that our technology, notably AI, will somehow, in conspiracy with our governments, supplant human volition and aspiration, core determinants of individuality, and displace what it means to be human, to the detriment of the human race, rests on an assumption of human passivity in its relation to both government and technology that is belied by the discriminate, turbulent and disruptive relationship that we have displayed with both of these forces throughout recorded history.

    If nothing else, humans are less likely than more to adopt any technology or suffer long any government, that does not serve their interests, rather than subordinate their interests to either. Just look at the Apple community’s ongoing critique of Apple’s products, and be assured that the future of humanity is unquestionably, and boisterously, human.

  • Gender exists, and it’s real. Like it or not, women have specialized body parts and chemistry for child rearing, and part of nurturing very young children is maternal tenderness. This is not to say men can’t be tender. They absolutely can. And there are women who abuse their kids. But generally speaking, there is a maternal aspect to women that is real. Likewise, while there are a few exceptions that prove the rule (such as Amazon women), generally speaking the highest authority in a country has been kings, the head of the family has been the father. This does not deny equality. Even the manager of a McDonalds is equal to the cashier. Equal as a person. Equal under the law. But society depends on hierarchy. The hierarchy as it relates to gender is more mutable, and women can certainly be the head of a family or a battle ship captain.

    NOW, all of this being said, think of butlers in our popular culture, such as “Mr. Belvedere” and “Alfred” in batman. Compare them to Alice from Brady Bunch or Florence from the Jeffersons. In all cases, the servants provide wisdom to their employers, and they are at times cheeky or contradictory. Alice and Florence are funnier and cheekier, whereas Mr. Belvedere and Alfred carry out their jobs in a sort of noble state.

    Relating all this to Siri, I would say that Siri could be a man or woman. The choice of the default woman’s voice was probably tested and the male voice was probably associated with a few things. By the way, part of Siri’s job is to direct the user to do things. And it could be that a woman was found to be less threatening, induced more cooperation in users.

    All of this is absolute and complete speculation, but I think some of the observations above are interesting points to consider.

  • I work as an airline pilot and all the voice annunciations, some of them for the most serious warnings (Collision with another aircraft, collision with terrain) are all male voices. FWIW.

  • After talking devices become more popular I’m sure someone will start selling celebrity voice packs. With apps like Adobe’s Voco they can get the computer to talk like anyone from Gracie Allen to Jessica Rabbit. https://www.youtube.com/watch?v=XfcqBElF0ZI

    In this case it would be nice to be able to distinguish your smart appliances by their voices. Have Stephen Fry as Jeeves be the voice for your smart home things, Julia Child for your kitchen stuff, and Brian Blessed for your car. https://www.youtube.com/watch?v=-JpKuYbJQK4

    Voice actors would find gainful employment in this new field.

  • I was quite offended when 9to5Mac used the headline “Sexism rules in voice assistant genders, show studies, but Siri stands out“Having a preference for one voice over another for an inanimate object is not sexism. It reminds me of the professor I read about when I was in college that proclaimed that any music school that taught Classical music was racist because they were all white male Europeans. In this case whether I use a female or a male voice is really a matter of personal taste. Anyone who would proclaim otherwise is just grinding an ever shrinking political axe and is not worth wasting time over.

    The rising power of Facebook certainly has me worried. They’ve already shown that they can’t be trusted with personal data. It’s not however that I mind the idea of more and broader democracy. I fear a Facebook democracy where people think clicking on an online poll makes a difference. A world where people think putting a sad emoticon on a story about a famine means anything. A worldwide Facebook political movement would be the triumph of the shallow.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.