Extracting value from unstructured data

null

With 80 per cent of a business’ data falling into the ‘unstructured’ category, you would think that organisations would know how to deal with it effectively. Yet, not all of it resides in a standard database. Typically human-generated, and taking the format of word documents, emails, social media posts and reports, unstructured data can be difficult to store, process and analyse.

As businesses deal with increasing volumes of data, many have developed a siloed approach to structured and unstructured data in which they look at them in separation. However, in order to extract value and gain the most business intelligence, it is imperative for a business to achieve an overarching view of both data types. While obtaining information in structured data is traditionally straight-forward, to make informed decisions, insight must also be gained from unstructured data repositories. Without a data management strategy that includes unstructured data, organisations run the risk of missing out on opportunities, failing to keep up with competitors, running up data centre costs and potentially breaching GDPR.

Through the analysis of unstructured data, it is possible to extract vital business intelligence which enables a company to know its data to know its company and drive growth. So, while it might seem like a difficult task to pull together both structured and unstructured data, it is necessary for businesses to learn how to break down the barriers of this data to extract valuable insights. A large number of organisations have started this process with 95 per cent of CIOs responding to a recent poll commissioned by ASG stating that they were structuring their organisations to manage all information with a common theme and approach.

With the introduction of GDPR, we have all been made to look at the information we hold internally and identify personal data. While businesses understand how to use database applications to do this with structured data, the same focus hasn’t been given to unstructured data. Yet, unstructured data presents many of the same problems and challenges you would expect to find when looking at structured data. Fortunately, this means organisations can approach it in the same way.

Approaching unstructured data

As with structured data and applications, content management tools create metadata about the content that allows it to be indexed, federated, and searched. In addition, data intelligence tools can be applied to unstructured data to identify personal data providing a more complete view of GDPR protected data under management and thus reducing risk from non-compliance. This allows businesses to look more broadly and think about the information they have internally both in structured databases as well as in less structured content, such as documents and statements.

In order to harness unstructured data, businesses must first transform this data into a format that is more manageable and easier to analyse. As this could be a significant task it is important to take a phased approach, initially focusing on the low-hanging fruit that offers the greatest gain with the least risk.

Then businesses can begin to look for the right solution to meet their business needs. It is vital to choose tools with the greatest capabilities to handle a broad array of content formats and sources and must also be easily configured in order to keep up with changing business needs as they develop. Once a solution has been determined, organisations can start to increase the volume of data being fed into the tools, remove duplicate content and standardise content into a common, searchable format. This will allow for value to be extracted and analysis to be made.

The data lake

Many organisations, rather than processing both data types separately, have established a data lake – a place in which they dump all types of information. While this might have once been a useful approach, they now have the difficult task of extracting important information from the lake’s murky waters.

With it becoming increasingly necessary to be able to understand all kinds of information from both of these types of data this is a particularly important task. As such, businesses must address the disparity between the data management practice associated with structured data, and their content practice which deals with reports, claims documents and statements, for example. This is resulting in businesses starting to take a more holistic approach in which they pull together those practices internally to form one organisational structure in the hope of treating all the content in a similar manner.

Dangers of a siloed view

Failing to create a single view of structured and unstructured data is likely to leave your business open to a number of risks, one of which is the exposure to hidden data breaches. As unstructured data is arguably the source of more personal data and is easier to access, it may be more vulnerable to cyber attacks. Therefore, businesses must pay close attention to how they store and process this data or leave themselves open to data breaches.

Organisations that fail to ensure they have a broad overview of both types of data will find it much more difficult to perform certain processes and will end up incurring bigger costs from data management centres. For example, a siloed approach makes it difficult to connect structured insured claims data with the source documents that support it.

A siloed approach can also make it difficult to make informed business decisions as they don’t have the right insights and could be missing information in less structured formats, particularly in terms of social and email data. It is imperative to be able to look at these formats holistically with other types of transactional information or it won’t be possible to gain the overview needed to develop insights which drive your business. 

Ultimately, organisations must stop viewing structured and unstructured data as two completely separate things and realise the results they can achieve by creating one view. Among the pitfalls of a siloed approach is the risk of organisations missing out on important insights which could govern business decisions and result in GDPR breaches. Businesses must instead harness unstructured data and will see a number of benefits in doing so, such as the ability to keep up with industry trends, derive insights, track competitive intelligence, engage with customers, predict customer behaviour and develop processes to mitigate risks.

Rob Perry, vice president of product marketing, ASG Technologies
Image Credit: Flickr / janneke staaks