To thrive, AI needs a healthy data diet


Today, there is a relentless curiosity and excitement around AI, as organisations start to recognise this technology’s true potential - but it also requires a real mindshift in how data is managed and prepared for consumption.

The growth of artificial intelligence (AI) in the enterprise is being hampered because data scientists too often have limited access to the relevant data, they need to build effective AI models. These data specialists are frequently forced rely solely on a few known sources, like existing data warehouses, rather than being able to tap into all the real-time, real-life data they need. In addition, many companies have great difficulty efficiently and affordably determining the business context and quality of massive amounts of data instantly. Given these difficulties, it’s easy to understand some of the historical barriers to AI acceleration and adoption.

Thankfully there's no scarcity of data, as volumes explode across society and, in particular, the enterprise space. This means that AI has a real opportunity to flourish - but, at the same time, data proliferation poses some tricky questions for business leaders and IT organisations.

In short, there’s so much data available that it becomes vital to identify what is relevant and impactful and what is just ‘noise’, in order to tap into the immense decision-making capabilities of AI. Data only becomes useful for AI when its value is clearly understood – particularly its context and relevance. Only then can it be used with real confidence to train AI algorithms.

Intelligent data

This is where the idea of ‘intelligent data’ comes in, as a foundation for enterprise-wide transformation. It’s the third stage along a data journey that many organisations are making or will do in future.

The first stage (data 1.0) saw them collect and aggregate data in order to drive specific business applications.

The second stage (data 2.0) has seen them create well-defined processes to allow authorised employees to access relevant data, even as the volume, variety and velocity of data have exploded.

Today, what’s needed in a data 3.0 stage is this intelligent data, which makes its own context and relevance clear to users through metadata.

In order to understand this better, it’s perhaps useful to consider the challenges a company faces in defining a new type of relationship with customers; for example, a razor blade company might want to sell its products directly to customers on a subscription basis, rather than relying solely on ad hoc purchases made at supermarkets and chemists. Making that change in business model will require data that comes in many different formats (structured, semi-structured, unstructured), drawn from a multitude of data sources (including databases and data warehouses, business applications, social media and the Internet of Things), residing in a variety of locations (on-premise, cloud and hybrid systems).

Some organisations assemble all this information in a so-called ‘data lake’ – but without intelligent data, this kind of repository may turn out to be of little value. In fact, analysts at IT market research firm Gartner has in the past estimated that, through 2018, a shocking 90 per cent of data lakes would be “useless”, because they are filled with raw data that only a few skilled specialists understand and know how to use.

That’s not helpful, because at many companies, there’s a real lack of time, money and skills available to determine the business context and quality of massive amounts of data - and this, in turn, is a big barrier to AI adoption.

But with intelligent data, it becomes possible to conduct Google-like searches on terms like ‘customer’ and instantly discover all potential sources of relevant data. Intelligent data can save an enormous amount of valuable time that might otherwise be spent collecting, assembling and refining customer data to feed AI models. It also delivers the most reliable results.

Metadata unlocks data’s value

So how can an organisation make its data truly intelligent? Metadata is the key that unlocks data’s value, when implemented in order to drive the productivity of a holistic data management platform for AI and machine learning.

There are four distinct metadata categories to look at if you want to ensure that you’re delivering comprehensive, relevant and accurate data to implement AI:

  1. Technical metadata – includes database tables and column information as well as statistical information about the quality of the data.
  2. Business metadata – defines the business context of the data as well as the business processes in which it participates.
  3. Operational metadata – information about software systems and process execution, which, for example, will indicate data freshness.
  4. Usage metadata – information about user activity including data sets accessed, ratings and comments.

AI and machine learning applied on this collection of metadata helps identify and recommend the right data. That data can, in turn, be automatically processed without human intervention to render it suitable for use in enterprise AI projects.

This leads to a ‘virtuous circle’ of good, nutritious data feeding healthy AI - and a more robust launchpad for digital transformation.

Amit Walia, President, Products and Marketing, Informatica
Image Credit: Advanced