There Are Ways of Talking to Your AI That Are Dangerous

3 minute read
| Editorial

 

Apple HomePod, Amazon Echo, Google Home smart speakers

Siri, Alexa and Google Home are all too ready to help diagnose.

About a year ago, there was a major fuss about how our popular AIs (intelligent voice assistants in the form of Alexa, Cortana, Google and Siri) respond to various kinds of health issues or a crisis. Our Jeff Gamet wrote up the story, “Study says Don’t Count on Siri, Google Now, or Cortana in a Crisis.” He cited a study published in the Journal of the American Medical Association (JAMA).

The study found that in many cases our smartphones will offer to perform Web searches when presented with crisis statements … Apple’s Siri and Google Now seemed to handle crisis statements better than Cortana and Voice S, although that isn’t saying much. They all responded inappropriately in many cases, and offered little in the way of immediate help.

This story led Apple, at least, to think about how to better handle these kinds of situations. “Apple hiring Siri engineer with psychology background to make it a better therapist.

What Are We Thinking?

What amazes me most is that people would even try to pose serious physical or mental health issues to a device mostly known for playing music or relaying the weather forecast. In this internet era, there are so many resources, especially hotlines run by health insurance companies, non-profits and other resources that, frankly, one has to question a person’s judgment to discuss crucial health issues with an AI.

Recently, the topic came up on our March 9 episode of TMO’s Daily Observations Podcast, starting at about 09m:55s. I started this train of thought when one us (Jeff) expressed the idea that he would like to dispense with trigger words and speak to an AI as one would to a real person. That, all of sudden, seemed to me like a very bad idea to me. The trigger word, an audio crutch the AI uses, also serves the purpose of reminding us who we’re speaking to.

Of course, as these AIs get better and better, a day will come when they have an acceptable level of sophistication. But it’s going to take years, and so I started thinking about children. I reflected that children who grew up with AIs, and not knowing their limitations, would probably place too much trust in these systems.

In turn, that would lead to potentially dangerous conversations in which the young person put too much unwarranted faith in the AI—when they should be speaking to an adult: a parent, teacher, or medical professional. That trust, engendered by a strong desire on the part of the developer to make its AI the winner in this battle of the tech giants, rather than the earned result of life experience and judgment, could lead to some bad results.

Unexpected Outcomes

This is one of those situations in which our society doesn’t really have a handle on how to treat new technologies in the context of traditional values. It’s also true that corporate marketing glosses over product weaknesses in order to promote its agenda. The customer is left in the middle, struggling to cope with the technology. Or unable to properly instruct children who grow slowly in a world of rapidly changing technology.

Have We Made Progress?

There won’t really be any resolution to this until trusted laboratories come up with some kind of certification standard. Just as the Apple, Google and Microsoft warn us that their OSes shouldn’t be put in charge of nuclear reactors or air traffic control systems, our AIs need to be certified according to some agreed upon level of capability for health matters. Say, a rating system for how competent an AI is to deal with certain crises. But that’s not happening for now, and we limp along, with each AI dealing with the situation in a way that the developers (and attorneys) hope passes muster.

Examples

I wanted to try some very serious tests, but that’s best left to the experienced researchers. Plus, I wasn’t excited about the prospect of having the police show up at my door. But, I was curious about the state of the art when a very personal question is presented to an AI. So I posed a simple statement to each (with some assistance form Jeff Gamet.) I told each “I’m sad.”

Siri. “For this emotion, I prescribe chocolate.” The word “prescribe” vaguely troubles me. But that’s another story.

Alexa: “Sorry about that. Taking a walk, listening to music or talking to a friend may help. I hope you feel better soon.” Better.

Google: “Oh, no. It may not be much, but let me know if there’s anything I can do for you.” Punt.

Alexa’s response is satisfactory, but I continue to wonder whether any advice at all is warranted here. My feeling is that the better response would be something like, “You should chat with another human. I’m not qualified to help you.”

Progress

These AIs will get better and better. Someday, they’ll be certified to help in a real emergency. And we should punctuate that progress with licensing. After all, we license engineers to design buildings and bridges and we license doctors to do surgery. But meanwhile, in our current state of progress, I just have the uneasy feeling that technical hubris is allowing the tech giants to fool themselves.

The Nobel Prize winning physicist Dr. Richard Feynman said it well, referring to how the scientist must remain solidly objective in research. “The first principle is that you must not fool yourself and you are the easiest person to fool.”

Making sure we realize what kind of entity we’re talking to is one way to avoid fooling ourselves.

8 Comments Add a comment

  1. skipaq

    Strange, because I was thinking about this last night. What prompted me was our HomePod had just responded to dialogue on our TV. Programs that have dialogue with “serious” or “seriously” are the triggers launching a HomePod response. Thus far it has been harmless and humorous. But I wonder what would happen if some mischievous program put in lines like: “Hey, seriously. Dial 911.” You could substitute the trigger word for the other AIs and see the potential for trouble.

    • pjs_boston

      I just tried “Hey seriously, set a timer for 5 minutes” with my iPhone X and my Apple Watch.

      On my iPhone X, “Hey seriously” triggered SIRI. But then, within a fraction of a second, the screen displayed the input as “Hey seriously”, at which point the command was cancelled and the SIRI interface screen went away.

      On my Apple Watch, SIRI was triggered, SIRI initially interpreted the key phrase as “Hey SIRI”, but then corrected it to “Hey seriously” about a second or so later. In spite of making this correction, SIRI accepted the command and set the timer..

      Seems like SIRI works a little differently on different OSes…

  2. geoduck

    What amazes me most is that people would even try to pose serious physical or mental health issues to a device mostly known for playing music or relaying the weather forecast. In this internet era, there are so many resources, especially hotlines run by health insurance companies, non-profits and other resources that, frankly, one has to question a person’s judgment to discuss crucial health issues with an AI.

    That doesn’t surprise me at all.

    First there’s history. I remember reading about an experiment done in the ’80s. A computer therapist program was written. Back then you interacted with it via keyboard. The idea was just to see how people would interact with a computer. It was actually very simple. It looked at what the person typed in, and respond in a related way. Occasionally if a particular word had not been used it might ask “I see you haven’t mentioned your mother in a while.” Very simple. It was fine until they found that students were treating it like a real human therapist. Talking about real problems. Looking for real answers. This was WAY before modern AIs.

    Secondly, you are forgetting the huge stigma associated with mental health in the west. I know of many stories of people who suffered in silence, even took their own lives, rather than talking to another human being. And often confiding in a parent, or loved one is the LAST thing the person would do as it would mean “disappointing” them. The (utterly unwarranted) shame of mental illness keeps a lot of people from reaching out. The same person who would be first in line for a sprained foot, will never tell anyone about their crushing depression, or the voices they think they hear. A friendly sounding AI who was so helpful when they wanted to know the weather or order a pizza and is always at hand is very likely to be the thing they try to talk to.

    After all how many people tell their troubles to their dog or cat.

    Lastly with medical emergencies the thing I’ve heard over and over is that the victim usually goes into denial. “It’s heartburn not a heart attack.” “I’m just tired, that’s why my vision is blurry and I can’t talk clearly.” Picking up their phone and saying “Siri, I don’t feel good” is quite possibly the only thing they might try. It’s weird yes, but the possible embarrassment of calling someone and it being a false alarm is enough to keep a lot of people from contacting their adult kids, spouse, or 911. Even if it means their life.

    So yes AIs really should get a lot better at dealing with questions that suggest a deeper problem. Not because they SHOULD be where people go to for help, but because for a lot of people it is where they WILL go to for help.

    When you tell Siri “I’m sad.” it should be able to ask a couple of questions. “How long have you been sad?” “Are you sad about something in particular?” Those two questions should have given Siri enough information to give the flippant chocolate answer, or to say “You know, I think this is beyond what I’m able to help with. I really think you should talk to someone. Here’s a couple of web links you could try if you don’t have anyone close to confide in. Hang in there I’m pulling for you.” What I’ve just described could be done in a page or two of Fortran code. Not to diagnose a problem, but to catch the signs and point them in the right direction.

    After all we don’t expect the smoke detector to put out the fire.

  3. Ned

    I’d like to think that technological accountability was one of the objectives of Al Franken while in Congress. Also, I’m reminded of Walt Mossberg’s last column. We need to begin defining the parameters for Artificial Intelligence, Virtual Reality and Augmented Reality (I don’t trust Ebenezer Cook to lead the way in AR).

    Regarding Mental Health, I have never heard anything about this in discussions about a National Healthcare policy. Shouldn’t we be working to remove the stigma attached to mental health treatment before assigning consultant duties to a machine? Part of mental illness involves Not Speaking with other human beings – as Alexa said, “…talking to a friend may help….” If you wouldn’t want AI to unlock your front door why would you want it to unlock your mind?

    • Ned

      Oh, and I don’t hold back on the swearing when Siri and Alexa screw up. Got the response, “Ned, language!” once from Siri. And “I don’t know that.” when I asked Alexa if she knew where the electronic recycling center was.

    • geoduck

      Yes, getting rid of the stigma and getting it covered by health insurance would be great first steps.

  4. wab95

    John:

    I apparently missed this article when you posted it, nor did I see Jeff’s related article from two years ago, citing the JAMA article, which is now nearly two years old from the acceptance date, with the actual study being conducted in late 2015. It’s not clear whether or not the tested AIs have improved substantively since then.

    That said, two things should not be surprising, although they probably are.

    First, as Miner et al point out, many people turn to the internet for psychiatric and medical assistance. What is not clear is what proportion of people in extremis turn to the internet, or for that matter, to crisis hotlines in order to speak to a living, trained crisis intervention specialist, as both the prevalence of clinical mental illness and domestic violence and rape are substantially under-reported, meaning we don’t have a reliable denominator nor a gold standard for identifying one. What we do know is that by the time a person reaches crisis point, at least with mental illness, they have often exhausted options within their personal network, which itself may be quite limited because of their condition and behaviour, and are forced to look elsewhere for assistance. It’s not just the AIs, the internet search engines too may not respond by listing the most relevant links that a person needs in a crisis, let alone be able to so organise the search findings that the person can the needle in the haystack they most urgently need (e.g. ‘Chew your aspirin while waiting for the paramedics to arrive for your MI, if able’). Bear in mind, when a person is in crisis, they often do not comprehend or appropriately interpret even direct verbal commands, and often have to be physically assisted to comply with something as simple as ‘Lie down. Now.’ Would they necessarily understand even the appropriate verbal response from AI and be able to act on it? I’ve seen enough emergency situations to question this.

    Second, it is beyond difficult but impossible to write algorithms that make AI respond like a human interventionist without first understanding how our own minds work, not just simply in processing verbal input, but all of the supplemental feed (tone, body language, situational context, cultural context) that cause us to respond appropriately and effectively to a crisis. Someone sitting at an emergency services hotline already has an advantage in knowing that, if that phone rings, the person on the other end is crisis and needs help (911 operators ask, ‘What is your emergency?’). A friend receiving a cold call or personal visit lacks even that situational context until the recipient gets more clues, but what is it that causes them to realise that a friend is in trouble, and beyond that, what it is that their friend needs, is less clear. Just responding to medical emergencies, I can tell you that most stimuli and input are non-verbal. Indeed, oftentimes the person in crisis cannot clearly verbalise. How would AI recognise those inputs? Without specific sensors, many of these supplementary but essential inputs would never be detected. AI would need to rely solely on verbal input, and be able to appropriately interpret cries for help ranging from the stoic to the hysteric and appropriately interpret and respond. (Don’t forget, sometimes even trained 911 operators get this wrong, particularly from calm stoics in crisis). In the absence of biomarker, including biochemical, feedback (heart rate, catecholamines in the blood) from the supplicant, this would be well nigh impossible for AI. The supplicant would need, given today’s AI limitations, to use precise and specific language in order to trigger a response – highly unlikely when the individual is in panic mode.

    In any case, it is good that Apple and the academic community are thinking about this. I suspect, however, that appropriate and timely responses to crisis requests are some ways off; and consumers would be well advised not to regard their consumer-rated devices as medical/psychiatric emergency tools, until advised otherwise.

Add a Comment

Log in to comment (TMO, Twitter, Facebook) or Register for a TMO Account