Human voice: the next generation of data

(Image credit: Image source: Shutterstock/polkadot_photo)

There is an old inspirational quote that reads “there are three things in life that you cannot recover: a word once it is said, the moment after it is missed and the time after it has gone”. But the author may not have imagined a world where technology captures an ever-increasing volume of our words. When someone leaves a message, talks to staff in a call centre, or logs onto a conference call, they are generating voice data - millions upon millions of hours of it. There is an immense amount of potential value in that data, and yet the majority sits idle in today’s enterprises.

So why aren’t companies already leveraging the wealth of potential available to them, and instead choosing to leave it abandoned and untapped?

Voice data is much harder to secure, deliver and analyse than ‘traditional data’. And it is harder to gather clean and representative data to build usable models from it. But companies should not be deterred; those that overcome the challenges will reap the rewards from this new frontier.

Here, there and everywhere: enterprise voice data surrounds us

In today’s connected age, the chatbot may have passed its sell-by date, but conversational artificial intelligence (interacting with computers via speech) is visibly on the rise. It is now the case that 20 percentage of Google searches are made via voice control and Alexa now has over 10,000 skills in its set. Furthermore, as the number of Internet of Things (IoT)-enabled platforms grows, so does the number of speech interfaces which can interact with them, such as smartphones and cars.

This goes beyond just speaking with machines – human-to-human interaction can also make up part of this revolution, such as customer service calls or interactions with healthcare providers. Some organisations, such as Pindrop, are already using this form of AI technology to detect fraudulent claims. This, however, is only scratching the surface of the potential of voice data. One day we may be able to mine call centre data to try and predict which customers are most likely to buy what product and even deliver real-time customer satisfaction metrics.

The road may be bumpy, but that shouldn’t stop your travels

There are a variety of challenges which come from attempting to analyse and understand voice data.

The principle issue facing those who want to get the most from the wealth of voice data their company sits upon, is ensuring access to quality data. It has been estimated that data scientists can take up to 80 per cent of their time just acquiring and cleaning up their data.

But even once the data has been cleansed and organised, this does not necessarily mean the data is sufficiently diverse, potentially resulting in data bias. Voice data brings about a whole new spectrum of data bias. For instance, an algorithm trained with male voices from Manchester will likely have difficulty understanding a female voice from Glasgow.

Challenges like this often result in the proliferation of ‘data capitalism’, providing an advantage to already-established data companies. Apple, Google, and Facebook often have a monopoly over this complex form of data, whilst smaller organisations scramble around to find sufficient data. There is a silver lining in this evolving technology space however, as large conglomerates develop open-source software libraries. Google’s TensorFlow, and AudioSet, (which is an ontology of over 2 million individual audio files), and YouTube’s YouTube-8M (which offers 450,000 hours of video that have been classified and labelled) allow smaller players to build upon these foundations.

Regulatory roadblocks

The quality of the data we use isn’t the only challenge. Regulation can also prove to be a roadblock to accessing this precious information. Data redaction will have to meet necessary compliance regulations and ensure the secure delivery of data across the enterprise.

GDPR has now been in effect for 6 months, and whilst organisations are familiar with the requirements and impact of not being compliant, many are still in the midst of understanding and putting in place processes and policies. Increasing pressure on companies to protect personal identifiable data, with the threat of heavy fines for non-compliance, has resulted in companies focusing on Production Data management and protection. However, all organisations have a wealth of non-production data that is not as securely managed or protected as their Production Data.

One of the reasons for over-looking a company’s non-production data is that the comprehensive security measures, such as masking data in the many test, reporting and analytics systems of a large company, can come at a high price and prove very complex to implement company-wide, especially when dealing with such a complex form of data. However, working with modern masking solutions that have inbuilt data profiling capabilities that can sift through large amounts of data to detect sensitive information, will help businesses manage their data privacy processes more efficiently. High-end masking solutions will take this one step further and recommend masking algorithms in order to streamline and accelerate the process of securing data.

Extending this to voice data simply becomes the next step in any organisation’s data strategy. Organisations hoping to tap into the potential of voice data must carefully consider the ways in which they will provide secure access to this information across their business.

Start now to reap the benefits of voice data

Emerging technologies such as voice-activated devices, artificial intelligence, and machine-learning are constantly opening up new opportunities for organisations to innovate and be competitive in their industries. Early adopters will gain the advantage today if they are able to set the right foundations and framework in place to manage and secure the data fuelling these emerging technologies.

Now is the time to be building on the basics, with the right data platform and tools, to establish where data is stored company-wide, and ensuring that those who need the data have fast and easy access to it.

Leveraging voice data can provide extraordinary advantages to businesses. In order to reap the benefits, businesses must invest time and effort to ensure the right practices and procedures are in place. By building out a framework to manage and secure data using both processes and tools to do so, businesses can build a strategy today to ensure they are ready for tomorrow.

Peter Majeed, VP for Customer Success and Field Services, Delphix EMEA
Image source: Shutterstock/polkadot_photo