Skip to main content

How to handle the unexpected in conversational AI

(Image credit: Image Credit: Geralt / Pixabay)

One of the biggest challenges for developers of natural language systems is accounting for the many and varied ways people express themselves. There is a reason many technology companies would rather we all spoke in simple terms, it makes humans easier to understand and narrows down the chances of machines getting it wrong.

But it’s hardly the engaging conversational experience that people expect of AI.

Language has evolved over many centuries. As various nations colonised and traded with other nations so our language – whatever your native tongue is – changed. And thanks to radio, TV, and the internet it’s continuing to expand every day.

Among the hundreds of new words added to the Merriam Webster dictionary in 2019 was Vacay: a shortening of vacation; Haircut: a new sense was added meaning “a reduction in the value of an asset”; and Dad joke: a corny pun normally told by fathers.

In a conversation, we as humans would probably be able to deduce what someone meant, even if we’d never heard a word or expression before. Machines? Not so much. Or at least, not if they are reliant solely on machine learning for their natural language understanding.

While adding domain specialism such as a product name or industry terminology to an application overcomes a machine recognising some specific words, understanding all of the general everyday phrases people use in between those words is where the real challenge lies.

Most commercial natural language development tools today don’t offer the intelligent, humanlike, experience that customers expect in automated conversations. One of the reasons is because they rely on pattern matching words using machine learning.

Although humans - at a basic level - pattern match words too, our brains add a much higher level of reasoning to allow us to do a better job of interpreting what the person meant by considering the words used, their order, synonyms and more, plus understanding when words such as book is being used as a verb or a noun. One might say we add our own more flexible form of linguistic modelling.

Machine learning isn’t great at precision

As humans, we can zoom in on the vocabulary that is relevant to the current discussion. So, when someone asks a question using a phrasing we’ve not heard before, we can extrapolate from what we do know, to understand what is meant. Even if we’ve never heard a particular word before, we can guess with a high degree of accuracy what it means.

But when it comes to machines, most statisticians will tell you that accuracy isn’t a great metric. It’s too easily skewed by the data it’s based on. Instead of accuracy, they use precision and recall. In simple terms precision is about quality. It marks the number of times you were actually correct with your prediction. Recall is about quantity, the number of times you predicted correctly out of all of the possibilities.

The vast majority of conversational AI development tools available today rely purely on machine learning. However, machine learning isn’t great at precision, not without massive amounts of data on which to build its model. The end result is that the developer has to code in each and every way someone might ask a question. Not a task for the faint hearted when you consider there are at least 22 ways to say yes in the English language.

Some development tools rely on linguistic modelling, which is great at precision, because it understands sentence constructs and the common ways a particular type of question is phrased, but often doesn’t stack up to machine learning’s recall ability. This is because linguistic modelling is based on binary rules. They either match or they don’t, which means inputs with minor deviations such as word ordering or spelling mistakes will be missed.

Optimising further

Machine learning on the other hand provides a probability on how much the input matches with the training data for a particular intent class and is therefore less sensitive to minor variations. Used alone, neither system is conducive to delivering a highly engaging conversation.

However, by taking a hybrid approach to conversational AI development, enterprises can benefit from the best of both worlds. Rules increase the precision of understanding, while machine learning delivers greater recall by recovering the data missed by the rules.

Not only does this significantly speed up the development process, it also allows for the application to deal with examples it has never seen before. In addition, it reduces the number of customers sent to a safety net such as a live chat agent, merely because they’ve phrased their question slightly differently.

By enabling the conversational AI development platform to decide where each model is used, the performance of the conversational system can be optimised even further. Making it easier for the developer to build robust applications by automatically mixing and matching the underlying technology to achieve the best results, while allowing technology to more easily understand humans — no matter what words we choose to use.

Andy Peart, CMSO, Artificial Solutions