Talking to our virtual assistants is almost like science fiction. We just love the idea of literally telling them what to do. No keyboard, no touch screen, just order them around like our electronic servants. The technology isn’t new - we’ve had simple voice interfaces for decades, allowing us to give basic commands or dictate documents. But in the last few years, voice has taken a huge leap forward. Google Home, Alexa, Siri, and Cortana have started to make the transition towards true AI, where they don’t just execute set commands, but actually understand what they’re being told. We’re reaching the era of “do what I mean, not what I say.”
But it’s still early days, and the tech still has a long way to go. We interviewed users in the UK and the US who had tried voice shopping, and weren’t entirely surprised by the range of reactions. We heard everything from “it was a disaster,” or “it’s horrible, I turned it off” to, “I love it,” and “we use it all the time.” The most common reaction was simply, “I don’t see the point.”
So why does it work well for some people and not for others? It wasn’t that the tech geeks loved it and the rest didn’t. Almost everyone who has Alexa is an early adopter, so they’re all geeks to some extent. And the most vehement negative reactions often came from the most tech-savvy users. The Americans definitely loved it more than the British, but that still doesn’t explain the range of reactions we saw. What it came down to was two key factors: what people were trying to buy, and the amount of time they were prepared to invest in training both themselves and the system.
To understand this, we need to look a little deeper into how voice shopping works. It’s a massively complicated problem to solve, requiring many different layers of technology and AI.
First, it has to understand speech. It has to take the sounds it hears, and convert them into words. It needs a huge vocabulary for every language, and use context to distinguish between homonyms such as “time” and “thyme”. It also needs to be able to understand different accents and dialects, male and female voices, as well as both children and the elderly. Given the difficulty that humans sometimes have understanding each other even when speaking the same language, it’s a major challenge for a computer.
As if that wasn’t enough, an effective voice-controlled AI needs to be able to operate in a noisy environment. There may be music playing, a TV show in the background, a washing machine running, dogs barking, and traffic rumbling outside. It has to be able to tune all that out and pick up the relevant sounds that make up speech.
Then, like any good butler, it needs to learn discretion. It’s always hovering in the background, listening to every word you say, but it needs to be aware of when it’s being addressed and when it should ignore you. In a shared house, it needs to know that it’s not okay for your children or your guests to order whatever they want. And, if you’ve ordered a gift for your partner, it needs to understand that it should be charged to your account, not to theirs.
Finally, we get to the actual shopping part of the problem. This requires the AI to know what I want almost as well as I do. If my wife asks me to pick up some coffee on my way home, we don’t need to have a long conversation about what kind of coffee, what size pack, or how finely to grind it. (Interestingly, almost everyone we spoke to used coffee to illustrate their first voice shopping experience.) The system doesn’t work effectively until it has a lot of data about your shopping habits and preferences - until it gets the information it needs, it has to ask you a lot of questions, which is considerably slower and more frustrating than shopping via web or mobile.
The learning curve goes two ways. The more you order something like cat food, the more AI learns about what kind of cat foods you like. And you learn that if you say “Alexa, buy wet cat food ocean whitefish” instead of just “buy cat food,” you’ll get what you want a lot quicker. The UX sucks at first, but if you’re prepared to invest the time to train both the system and yourself, it’s ideal for simple repeat purchases, just like Dash buttons. And for hands-free situations, such as while you’re cooking, doing housework, or driving, it’s much easier than having to get to a computer or mobile device.
Where voice hits a real limitation is that it’s not good for browsing or comparison shopping. For one-off or complex purchases, the audio interface is inadequate. You wouldn’t want to buy something like a sofa, a camera, clothes, or even a teddy bear without being able to see it, read reviews, and check the specifications. For that, you really need a screen. That’ll doubtless happen in the next evolution: the AI will fire up the nearest screen or projection display and show you suggested items. On the other hand, even if it’s not great at helping you buy these kinds of items, it should be able to help you with issues like delivery tracking, regardless of what device you used to make the purchase. “Where are my new shoes?” is a perfectly reasonable question.
However, here’s the bottom line. AIs are like humans in one vitally important respect. They learn by failing. They listen to billions of words a day and learn to understand us: look at how much voice input has improved since it became a standard feature on mobile devices. AIs undertake millions of shopping transactions and learn about what we really want. And, like kids, they observe everything we do so that they can fit in with us better. Like kids, they do their best, they make mistakes, and we teach them to do better. Voice shopping may be limited and clunky now, but it won’t take long before it’s just another way to do business.
There’s one final issue that voice shopping developers need to address. A good shopping AI shouldn’t be linked to just one vendor. It needs to be focused around you, not the store. When you want coffee, it should search across multiple stores and see what’s available from all of them. It should retain your preferences even when you’re visiting a store for the first time. To gain widespread acceptance in the home, AIs will have to work for the benefit of the customers, not just be an agent for the retailer.
Andreas Hassellöf, founder of Ombori
Image Credit: Georgejmclittle / Shutterstock