Business data is increasingly seen as both a blessing and a curse in equal measure. There is so much value to be gained from harnessing all the various forms of data flowing into organizations from external sources as well as through the people, processes and technologies within a business, that it is a significant competitive disadvantage not to be actively collecting, processing and analyzing as much data as possible. But equally, with new stringent data protection regulations in various territories and data breaches hitting the news on an almost weekly basis, the consequences of not managing this data properly have grown into quite the minefield for CIOs and data management professionals.
The data we are often referring to these days is known as unstructured. In the early days of digitization, most data was generated by systems and stored in databases as structured data with predictable growth, and IT and financial staff members were the only ones with the interest or skillset required to work directly with a database. Today a far greater proportion of a company’s data is created outside of the traditional relational database and in a variety of formats; documents, photos, videos, social and sensor data, with an exponential growth rate. A company’s unstructured data landscape is inherently chaotic to begin with, scattered across a multitude of systems, and not immediately predictable or controllable. This results in many companies not examining and extracting value from any of the unstructured data they have access to, or worse, simply not knowing it exists or where exactly it resides.
Therein lies one of the biggest issues. The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes, is also known by the term ‘dark data’. It is very easy to end up in breach of data protection regulations with unstructured data as a fundamental requirement is knowing what data you hold, which data is personally identifiable information (PII) relating to individuals, and protecting it. If you don’t know what you have or where it is, how can you protect it?
The most crucial first step is ensuring you have 100 percent visibility on all the data sources within the business, as well as gaining understanding of the data (what’s necessary and what’s not) to make sure that your data itself is not your Achilles heel. That’s all well and good of course, but every business’s goal should be to be deriving competitive value from their data, not just being on the defensive.
Becoming a data driven organization
The phrase “data-driven” has become commonplace in the tech industry, but it’s more than just a buzzword. It’s a common sentiment that the organizations with the most data will lead the pack, but the truth is much more nuanced than that - what’s more important is what organizations do with that data. It’s not just the organizations with the most data that will cross the finish line first, it’s the companies that identify, analyze, and act on the data who will see the biggest rewards.
Gaining basic control over unstructured data is important, but equally important is deciding what data to collect, analyze, and store. Collecting terabytes of random sensor data is useless if your company’s main way of benefiting from unstructured data will be through areas such as marketing or sales. It’s important to align your business’s goals with the data sources you have available before embarking on the next step: making sense of the data itself and preparing it for analysis.
Organizations need to be able to access, cleanse, normalize and blend disparate data sets from NoSQL data storage platforms and data lakes, the environments where unstructured data is often stored – not an easy task! Blue Hill Research reports that most analysts spend 40 to 60 percent of their time preparing data, and whatever time is left over on analysis. Whenever manual processes appear to be the bottleneck in any IT process, automating at least some portion of this will better serve the analysts but also the business as a whole.
Bridging data silos
One big limitation for businesses is that data analytics platforms are often siloed. When departments use different data sources, it can be challenging to streamline the data and review it holistically. Organizations must use tools that allow them to connect to multiple data sources including NoSQL databases, data warehouses, applications, data files and of course the more traditional relational database. Unstructured data has a particularly early expiration date, so it’s crucial that these particular databases are dynamic and constantly updated.
Another limitation occurs when the data can only be accessed and analyzed by a technical member of the team. In order to prevent the bottlenecks that are created when only one team in the organization is able to access the data, tools should be easy to use in order to ensure that data is accessible to business users, as well as technical users.
Organizations that are serious about mining value from unstructured data need to invest in the right tools to be able to integrate it with more traditional sources, but also to make it simple and easy as possible for that data to be used quickly before it becomes stale. If analysts are forced to move slowly, they risk arriving too late to capitalize on potentially profitable transactions, investments, customer marketing opportunities or social media events. Empowering data analysts with a standardized tool to automate data preparation leaves more time to work on and analyze the data in a way that actively benefits the business.
John Pocknell, Senior Market Strategist, Quest