Skip to main content

Finding your voice in the VUI conversation

(Image credit: Image source: Shutterstock/polkadot_photo)

The popularity of voice-controlled technologies has begun to snowball over recent years. One in five UK adults are already using a voice-driven virtual assistant on their smartphones,1 and last year saw the UK dominate the European smart speaker market with 3.94 million Amazon Echo and Google Home devices sold.2 Naturally, developers are seizing this opportunity and are increasingly creating Voice User Interface (VUI) apps to enable brands to engage with consumers. To empower rapid adoption, both Amazon and Google have created free ‘skill’ building platforms for their personal voice assistants.

But, as accessible and exciting as this technology is, it is still in the early stages of adoption and presents a steep learning curve for all concerned – developers, brands and customers. If you don’t know much about it, building a VUI skill might look complicated. If you know a little about it, then you’re probably thinking it’s super easy. But if you’re trying to deliver an excellent user experience - venturing beyond simply tweaking one of the available templates - you’ll almost certainly encounter some pitfalls.

There are some important considerations developers should take on board to overcome the unique challenges posed by delivering a positive brand experience in a voice environment - from development practicalities, voice and language design to the all-important user experience.

VUI as the solution for a genuine customer problem

It goes without saying, but to add genuine value and avoid creating vanity apps, it’s imperative that businesses put the needs of their customers first. The vast majority of successful voice services are swift, transactional and compliment other channels. Users have a goal they wish to achieve - tomorrow’s weather, the latest podcast episode, something to cook tonight – and providing the easiest, most direct route to that content will create the best experience. If you are trying to use VUI for any other reason than helping your customer solve a problem you know they have, quit while you’re ahead and save your development budget. As with all technology, trying to find the problem for a solution won’t help your customer, but could easily damage your brand reputation. 

Developing for a real-life conversation

Developers will find many aspects of voice interfaces are quite different to graphical user interfaces (GUI). If you are building for the web or apps, there are full stacks available. You have sketching, design, diagramming and prototyping tools, which allow you to test the solution in real environments before you build it.

For VUI, there are many software development kits (SDKs) and templates that will help you get started very quickly. You can write dialogues, create decision trees and undertake some user testing. But, there are not-yet enough mature testing tools available to enable brands to enrich the user experience. A good experience can only be created through thorough testing within the skill. However, this involves developing and redeveloping your skill multiple times until the dialogue is perfect. And although this dialogue may be perceived as just a nuance for developers, it is actually the critical element that shapes the user’s view and perception.

Building an authentic VUI testing environment 

Many teams use The Wizard of Oz method – pretending to be a VUI device – to improve some of the dialogue early on to avoid re-working the majority of the skill during development. There are a number of creative ways people pretend to be a VUI device during testing. (We even considered putting someone in a box to overcome the way users behave differently in their own home than with a human observer.) As well as not creating an authentic experience, this method doesn’t overcome the challenge of having to build and re-develop the actual solution during testing with real users over and over. So, we found a less-arduous and more efficient method.

One way to avoid building a full skill multiple times is to trick the natural language processor (NLP) of the VUI device into not having specialised commands for the initial prototyping phase of the project. As standard, Alexa’s NLP tries to convert natural language into functions and parameters, such as “Alexa, what’s the weather going to be like in {London} {tomorrow}”, which becomes a function like GetWeather(where, when). However, making assumptions on the actual functions required before finalising your dialogue would still require you to build the solution multiple times.

Instead, the best way to trick the NLP is to generate a lot of random words on the training data to prevent the AI from choosing particular commands. This will cause all of the dialogue to be sent to your code, instead of a particular function. Allowing you to log exactly how users interact within their own homes and to remotely simulate a possible response on the skill.

To accurately simulate the user experience, we created a website that captured the dialogue and allowed operators to remotely run response dialogue on Alexas. As our testing solution doesn’t have specialised commands, all of the dialogue is sent to the operator using this website and a log of the conversation appears on the screen. Dialogues are pre-prepared on a decision tree, with variable responses depending on the input, or written in real time to respond to an unexpected interaction.

This prototyping tool allows the operator to respond very quickly in dialogue or audio so that the user doesn’t notice the human operating ‘behind the curtain’ – or in this case remotely. Making it possible to test multiple experiences in succession with real users in the context of their homes. This allows developers to perfect the dialogue before having to build a line of code on the actual skill.

User-led interactions

Another challenge for developers is to improve the effectiveness of content discovery. It’s normal to display a list of options in web design but replicating this on VUI presents a whole new challenge. The VUI user is heavily reliant on their short-term memory, so options have to be limited and delivered in a way doesn’t overwhelm the listener.

Users don’t like to be given set options. They want the machine to be very smart and context aware. So, while understanding the command is relatively easy, recommending the solution can be very complicated. For example, imagine you are asking for a particular song in a universe of millions of songs – you expect the system to give you the right song and not to be given lots of options.

This is a challenge that appears time and time again when you want interactions that are short and to the point and the target data is not in the dialogue. Even though VUI comes with AI in the form of natural language processing, the developer might have to use their own machine learning algorithms for recommendations. Luckily, machine learning is becoming more easily available on the cloud.

Ultimately, the VUI user has complete freedom, introducing unpredictability (and error handling) that’s hard for developers to deal with. Even if you present the user with a finite set of options, they can still say whatever they like. If you present a customer with two options ­- meat or vegetarian – and they say “soup”, you could repeat the options, emphasising that there are only two choices; but that could be patronising. You could also try interpreting the user’s response as best you can, but be careful to avoid an error spiral (opens in new tab). Instead, developers need to dedicate significant time to considering the best way to handle errors.

Delivering extraordinary brand experience

Building a VUI skill is easy. Building an excellent experience in a VUI skill is less so. To embrace voice, ensure you understand its intricacies - play with competitor services, research the platform, use a voice specific design stack, and engage a multidisciplinary team with the user experience, copy and development people overcoming the challenges together.

Most importantly experiment and test with real users. It is easy to forget that your brand will be speaking in a customer’s home where patience is short. To bring an extraordinary brand experience into your customer’s home, remember less is more. Focus on the key benefits you know customers want and champion clarity and purpose above all else.

Sandro Petterle, Technology Director at Rufus Leonard (opens in new tab).

Image source: Shutterstock/polkadot_photo

Sandro Petterle is the Technology Director at Rufus Leonard.