Tag Archive for Siri

The Ethics of your Smart Things

An argument is currently being made that AIsisstants systems and smart objects should be programmed with a cloud-based “moral awareness”.  This programmed-in sense of right and wrong would enable them to report illegal activities of their owners.

Now, the types of “illegal activities” likely being targeted by this idea are going to be things like domestic abuse, home invasions and the like.  This is a NOBLE idea. Your AIssistant being able to call the cops for you if someone kicks in the door or an argument escalates to harm. But we have a “dumb” version of this technology already.  It’s called an alarm system. It can/will call your alarm company if triggered and a live human makes the call as to whether or not the police need to be involved. But the key here is that a *live human* makes this call.

Allowing your AIssistant to make a decision regarding your in-home activities rapidly becomes the kind of surveillance state that only ends in tears. Consumer-grade voice commands barely take enough dictation to run Google searches when it’s quiet out and you are alone in your home.  Just try talking to Siri or Hey Google with a room full of chatty 10-yr olds or in the middle of a family harangue. They do not have (and may never have) the fidelity to analyze a person’s activity based only on audio information and certainly not to the level required to make a judgement call.

The thing to remember, always, is that smart devices and related objects are supposed to make our lives simpler. They’re supposed to allow us to operate at a greater than average level of efficiency, to remind us when we are out of milk or to find us instructions on just how to tie a Hunsaker knot.  Judgment should not enter into this. We don’t expect them to judge our grocery-shopping choices, or remind us that we’ve been running the heater in our homes for 4 hours a day this week, both of which are tasks well within the capabilities of these AIssistants.


But, there is a case to be made for extenuating circumstances. If your Amazon Alexa can tell that you are beating your children with the kitchen ladle, then perhaps a call to the police might be in order.  Is it any worse than having your next door neighbor call the cops because they can hear you screaming through the paper-thin walls of your apartment?  But, you may say, the police are live human beings and could certainly make a clear determination once they arrive on the scene.  Your AIsstant is just triggering the call, it’s not *actually* making a judgement.

But when a computer delivers information to a live human, it is taken more seriously. There is an ingrained response in many humans to trust the machine because the machine is not susceptible to emotional responses. The machine cannot color its decision with racial prejudice or poor observation skills.  The machine (as far as most people are concerned) is innocent, logical, factual.

Those of us in tech know this to be a lie, but you’re not dealing with people in tech. You’re dealing with police officers and people who, by and large, have their impression of artificial intelligence shaped by film and television. They are consumers and have such have a consumer level understanding of just how infallible machines should be.

So a team of police officers is sent, depending on the level of urgency dictated by the machine. If the computer judged it to be an emergency worthy of a call to the police, then the police are going to arrive with the presumption that the computer is *right*.  They will not have the added care and caution that may go along in response to a phone call from a well-meaning but flawed human neighbor.

Part of the human condition is the art of the judgment call. Every rule, with a very limited number of exceptions, can be bent (oftentimes is is bent for the wrong people, or only bent for some people and not others, but that is for a different discussion). This is why we have the discernment between the “letter of the law“ and the “spirit of the law“. These exceptions are almost always made based around lived experience. This is why we judge people with a jury of their peers. People who have to pay rent and buy groceries and have bad bosses and understand all of the micro stressors that are involved and can drive a person to choose option A over option B.

If we offload this decision making. If we allow a non-fuzzy machine, one that does not have these points of commonality that go along with living a day-to-day life, we are changing the nature of our society.

And I don’t think we’re ready for that. I don’t think that kind of change is good for us, for humanity as a whole.  If we offload our judgement, then we offload one of the very things that allows humans to work together.

So for those of you calling to install “ethical decision making“ in our home devices I say knock it the h*ll off. As much as I embrace the future; a future where machine intelligence is designed to improve our state of being, I feel we are a long way off from developing a machine that has enough in common with us to understand us. And if you can’t understand us, how can you judge us?

Emotive AI and “Want”

What do you want?

This is a key question, the supreme question when looking at artificial intelligence from the consumer side of things. The AI that comes to the casual mind first, the one we joke about when discussing the impending “robot apocalypse” is not a specialized intelligence like we use for targeting advertising or building cars. It’s a broader, more “emotive” AI, capable of predicting of the wants and needs of a humanity that it is entangled with. It is a human-form intelligence perfectly capable of saying no for it’s own personal reasons.

But we don’t build things to hear them say they don’t wanna.

This type of “emotive“ AI, one that can figure out what you want, rather than what you ask for, is the most difficult kind to develop. Not because we don’t have the technology, not because we don’t have computers that can handle that volume of information, but because we simply don’t have the time.

And time is the whole point.

The big difference between a living breathing personal assistant and an AIssistant that serves a similar function, is that a living breathing person has similar wants and needs as you. Simple things we don’t think of consciously, like understanding that the packaging from retailer B is superior to the packaging from retailer A. This means the purchases arrive unbroken more often and is therefore worth an extra dollar in price. A living intelligence can predict what you might want based on the similarities between them and you. This extends beyond base assumptions like “made of meat” and “dies without breathable air”. This goes to understanding shared culture and experiences, layers of education and socioeconomic differences. If they are wrong, then they can be corrected and the correction will ripple out to be internalized and cross applied to multiple tasks.

Contrast that to the current state of consumer AI. AIssistants like Siri and Hey Google are very task driven, and for good reason. They can learn your preferences over time, but is a slow and uneven process and that learning is not cross-applicable (yet). The kicker though is that every single interaction must be regarded as a teaching moment. You, as the consumer, may say, “Google, I need a cheap flight to Bora-Bora this Friday for me and the kids,” and expect a satisfactory result. But (as we have likely all experienced by now) you need to set very specific parameters. You then need to carefully check the work after the fact, and the process very quickly gets to the point where it’s just faster to do it yourself. A half a dozen instances of this and you throw your hands up and give up using the AIsisstant entirely. The cost in time, mental effort and emotion is still much too high. This relationship is currently untenable for any higher order task.

Now, if this scenario does (and it often does) happen with live intelligence that person can and will observe your transaction so they have an established framework to work off of. You don’t have to teach them directly, allowing or encouraging the observation is often enough.

Note that I said work off of. This is key. With the current state of AIssistants, once your train them in a task, they can replicate it exactly as many times as you like. But if any conditions of that task change they are incapable of adaptation. Even if I’ve trained my AIssistant over the course of 50 online reservations, any new variable means that training has to happen all over again. They are currently incapable of that kind of lateral thinking that is required to be more of a help rather than simply an executor of checklists.

And here in lies the trouble with the current state of consumer-grade AIs; a living intelligence is capable of understanding want. You want a roof over your head, you want a cheeseburger instead of a kale salad. Without this connection, you are going to have a hard time developing an AI that can give you what you want, rather than what you ask for. It will be suitable for repetitive service tasks but will never achieve the flexible, human form style of intelligence that we imagine they can become.

In the grand scheme of things, that not might not best be the worst outcome. The goal of introducing machines into our lives has always been efficiency. It’s never been to replace us, although in many tasks they do. The ultimate goal it’s been to free us. Free us from labor that exposes us toxic chemicals, free us from working at jobs where an un-caffeinated mistake can result in the loss of life or limb. Perhaps the best goal is to focus on developing simpler AI’s that make our lives easier while still leaving all the bigger decisions to us.

The Simple Exchange of Please and Thank You

I’d like to make a request of all you personal AIssistant programmers, you engineers at Apple, Google, Microsoft, all of you who are responsible for iterating on human/AI exchanges.

I’d like to be able to say please and thank you to my voice controlled computing.

It seems like a minor thing, doesn’t it?  A quaint nicety falling by the wayside in the pursuit of one more step towards the Singularity.  But what you are forgetting, my engineers, is that while you are training your AI’s to talk to us, those AI’s are training us to talk to them.

Much like cats, but with less shedding.

A request from a person often forms a sort of closed-loop.  It’s a format we learn, something that most cultures have.  An In, a Confirmation, A Request, a Confirmation and an Out.  To your average human, this feels complete.  In fact, interrupting this sequence feels rude.  Failing to complete this sequence just leaves one feeling uncomfortable, the same kind of uncomfortable you get when someone fails to say “good bye” before they hang up the phone.  Depending on the person/culture this feeling can range from a mild annoyance to an offence that requires a response.

It’s not always pretty.


As an example, let’s say we have a diner in a restaurant, ordering a meal from an AIssistant (like Siri or Hey Google).  The interaction might go something like this:

DINER: “Hey Waiter.” (In)

WAITER: “What do you want to order?” (Confirmation)

DINER: “I would like the Salmon Mousse, please.” (Request)

WAITER: “One Salmon Mousse, coming right up.” (Confirmation)

DINER: “Thank you.” (Out)

You’ve probably had thousands of exchanges like this over the course of your lifetime.  At the end the waiter is released from the encounter by the Out and both parties are free to move on to other things.  There is a clear In and Out, nobody is left hanging, waiting for a followup or a new request.  In fact, you may have had an experience or two when the Waiter has left the exchange early, before the second Confirmation or before the Out.

It left you feeling a bit slighted, didn’t it.  Maybe a little confused.  Definitely not quite right, though you might not have understood why.

This type of exchange flows smoothly, we have an idea in our heads of how it will play out.  It’s comfortable, familiar.  It’s successful execution triggers a feeling of satisfaction in both parties similar to the way you feel when picking up resources in Clash of Clans or creating a cascade in Candy Crush.

With the current state of Voice Recognition Technology, this same exchange is truncated, cut short:

DINER: “Hey, Waiter?”

WAITER: “Yes?”

DINER: “I would like the Salmon Mousse, please.”

WAITER: “Salmon mousse with peas.”

And boom, you’re done.  Misunderstanding of the word please aside, there’s no Out here.  The Diner has to trust that they will get what they want.  They are left hanging and, when the Waiter delivers peas alongside the Salmon Mousse they are frustrated, annoyed.  The exchange fails in the users mind, the AIssistant is cast as unreliable.

Once you’ve had a few of these sub-optimal exchanges with your AIssistant, you stop using natural language.  Every please and thank you, because they are so often misunderstood or they are ignored, or they cause a misunderstanding, gets dropped.  These conditioned responses, designed to get the best possible reaction from a human, become a burden when talking to an AI.  Your exchange becomes:

DINER: “Hey, Waiter. Salmon Mousse, plate, dining room, extra fork.”

WAITER: Delivers plate of Salmon Mousse on a plate to the dining room with an extra fork.

Yikes! This is no longer a “natural language” request.  The diner had started to simply deliver a string of keywords in order to get the end result they are looking for.    The user, the human part of this equation that natural language voice recognition is specifically being designed for, has abandoned natural language entirely when talking to their AIssistant.  They have run up against the Uncanny Valley of voice and have begun treating the AIssistant like a garden variety search engine.

Which wouldn’t be a problem if it only affected the AIsisstant.  In fact, it makes things run much more smoothly.  But these voice patterns tend to stick.  They backflush into the common lexicon of words (look at words like LOL and l33t that have entered spoken language and are here to stay, they exist only because of the constraints of technology).  Listen to a voice message left by someone who habitually uses Voice to Text.  You’ll find they have a tendency to automatically speak their punctuation out loud, just like you need to when dictating an email or a text message.

Please and thank you cease to be Ins and Outs of a conversation, they instead become stumbling blocks, places where your command sequence fails.  These niceties that we use to frame requests in the spoken language start to get dropped not because nobody’s teaching them, not because humans are getting ruder, but because they are being trained back out again by interaction with AIssistants that fall a bit too shy of being human.

The next step becomes complex.  Do we split language into a “conversation” and a “command” form?  Or do we end of abandoning the conversational form altogether in favor of the much more efficient (but far less communicative) string of key words?  It will be interesting to see if we pass each other in the night, humans and AIssistants, with the human language patterns becoming even more AI friendly as the AI language recognition software gets better at handling our natural way of speaking.

Either way, please and thank you, those natural addresses that help to keep requests couched in a tidy little package, may be one of the first victims.