How semantic technology is making sense of our big data

Hardly a day goes by without the latest news and developments in artificial intelligence (AI) being reported across the breadth of media, both traditional and social. In early 2016 the world has learned about AlphaGo’s 4-1 victory over one of the best human players at ‘Go’ using deep learning techniques. Also in the first quarter of 2016 Microsoft launched its online chatbot Tay, which uses natural language processing to learn and understand the way humans speak.

Whilst these developments are without a doubt exciting, this is not the first time that AI has been a headline grabbing heralded a false dawn for the business world at the time. When IBM’s Deep Blue defeated chess grandmaster Gary Kasparov in 1997, the machine intelligence revolution failed to materialise. The problem for all of these high profile successes is that they failed to mention how narrow and specific the AI’s expertise needed to be to achieve that success. Hundreds of man hours were required to beat one human expert. That is not to belittle the astonishing accomplishment when a machine is able to better that human at something as robustly complex as a game of Chess or Go. It was no wonder that these applications do not really reflect the real everyday needs of the remaining 99 percent of the world today. In the business world the question is not what can AI do, but how is AI technology relevant to me today?

The data is still too big

Since the phrase was coined an endless amount content has been produced to advise the world on how best to overcome the big data challenge. Ignoring the irony of how this content simultaneously offers advice on overcoming big data and then increases the mass of content itself, for many businesses it still remains the significant IT challenge. The essence of the problem being velocity, volume and variety, commonly referred to as the three V’s of big data with which businesses cannot keep up with.

A good example of the gap between research and today’s business need is Google’s recent foray into literature analysis. Although this was a very interesting first step towards something very bold, this is a major step ahead of what would be helpful to businesses today. Google’s analysis is an attempt to do natural language understanding, something similar to what Microsoft’s Tay soon became exploited for.

For most businesses, especially those across the publishing, pharmaceutical, legal and financial services industries, natural language processing (the ability to identify relationships within content) rather than ‘understanding’ (enabling computers to interpret the meaning of natural language similar to humans), is a technology powerful enough to help them tackle their everyday problems.

The fragmented way in which firms across all of these industries hold data and content is one of the main causes of operational headache. For all of them the ability to accurately match data across their databases as quickly as possible is absolutely paramount to the way they work. Consider the benefits for a legal firm which wanted to investigate if some intellectual property was in breach of an existing patent, or a research and development centre investigating if anything resembling its line of inquiry had been modelled elsewhere before. Alternatively, think of the benefits that the global network of investment banks and trading environments could gain by reconciling the incredible volume of trades across global markets each day and the compliance obligations they must also adhere to. Organisations across these sectors need to avoid being held back by their data challenge. Facing these challenges, it is worth exploring how semantic technology can aid organisations today.

Processing vs understanding

Natural language processing can allow a computer to identify links between entities within a database that traditional data matching simply cannot compute. This then empowers human users to glean insights from their data which would otherwise not be achievable.

Unlike Google’s literature analysis and Microsoft’s Tay, most enterprises and SMBs use technology that is a long way behind these headline grabbing activities. Natural language understanding lacks the maturity that natural language processing provides to the IT infrastructures used by businesses today. Ontotext has been able to take a lead in this arena and develop semantic technology which has a direct application to the manner in which data is processed by computers and used by humans.

The notion of training an AI to understand natural language is very exciting and for a company with pockets and minds as deep as Google’s they can always test the water of what might be exciting in a couple of decades. However, what stood out about Google’s literature analysis was how basic the experiment it conducted was, demonstrating the sheer complexity of the challenge at hand. At the same time, the speed with which humans ambushed Microsoft’s Tay to train it in controversial language shows there is still progress to be made in the relationship between natural language understanding technology and humans.

Where natural language understanding still has much work to do, natural language processing and semantic technology can do the knowledge tasks of today to deliver demonstrable business value. As an added bonus, as metadata increases using standard languages like RDF, having natural language processing foundations will make adopting the advancements of natural language understanding easier in the future.

It is also reasonable to note that the challenge also relates to the structure and output of your data management. The application of semantic technologies within an unstructured data environment can only draw real business value if the output is delivered in a meaningful way for the human tasked with looking at the relationships. It is here that graphical representations add user interface value and presents a cohesive approach to improving the search and understanding of enterprise data.

This is a situation which remains at the core of what most IT teams find incredibly difficult to assess. With so many variants of technologies which all claim to solve your business problems and the headline grabbing developments being published across the news, how are you supposed to know what works? Too many people are constantly trying to predict what their data challenge will be in the future. However the best way to prepare for that future is to tackle the data problem you have today.

Dr Jarred McGinnis, UK managing consultant at Ontotext
Image Credit: Shutterstock/Carlos Amarillo