Marco Arment recently provoked debate in Apple circles by pointing to Google and Amazon's growing lead over Apple in voice. He fears that “if the landscape shifts to prioritise [voice and other] big-data AI services, Apple will find itself in a similar position as BlackBerry did almost a decade ago.”
Commenting on this in Betanews, Joe Wilcox thought a better analogy was Nokia. As Google Now and Amazon’s Alexa continue to make strides, the market is moving “from touch to touchless interaction.” And just as Nokia’s efforts with touch were “hobbled” by its attachment to keyboards, Apple’s “finger first design philosophy” tethers it to an older “design ethic” destined for obsolescence.
Wilcox and Arment raise important concerns, but they’re wrong. Voice interaction won’t become the new paradigm of mobile computing. A glance at the recent history of tech makes clear why.
Backing away from touch
The unspoken premise of the argument for voice as the interface of tomorrow is that voice stands to phones and PCs as windows and the mouse stood to DOS and the command line interface. If and when voice AI becomes good enough, the vast majority of people will back away from their touch screens as everyone left DOS.
But the analogy breaks down on closer inspection. We chose windows over DOS as an OS interface, not a computer interface. We still clung to our keyboards for the vast majority of our use. More to the point, the most common things we do on both phones and PCs are still the same as they were in 1984: reading and typing out text. In thirty years, no matter how good voice recognition, AI, and bots become, we’re likely to spend most of our time with computers the same way. For a few reasons that aren’t likely to change.
There are a host of things we commonly do that it’s easier to do with our hands and eyes than it is with our voices and ears. It’s easier to send a quick text and to read it than it is to call (and thus, as a recent Neilson survey indicates, on average, Americans exchange twice as many texts as calls). It’s easier to use a finger to snap a picture; triage our email inbox; or navigate through Spotify or iTunes. It’s much easier to write longer documents with our hands and edit them with our eyes than it is to do either by voice—consider the limited uptake of the near-perfect Dragon Dictation.
To speak or not to speak
Thirdly, and more importantly, if it’s easier and quicker to navigate a page or screen visually rather than aurally — Google results, a document, a game, a map — we know we’ll be stuck reading screens for a long time to come. And so long as we do, it will always be easier to reach out and touch the screen — to click a link, insert the cursor, move the map, or play the game — than it will be to speak.
The possibilities of voice are exciting. But they’re not going to affect a fundamental shift in the way we use our devices most of the time — as did Windows or the iPhone. We may do more with voice than we do now. But we won’t keep our hands away. We know this, because after three decades, it still just feels right.
Robert Francis, Telecommunications, MSI UK