If you are responsible for analysing data, you may be familiar with the following scenario. The chief marketing officer has asked that you provide her with a comprehensive analysis of your company’s best customers, where they are geographically located, whether they shop online or in stores, what products they have bought, and what they are saying about your company on social media. This information is critical to a major marketing campaign, and the CMO needs the information within a few days. Given the variety of internal and external data sources that include structured and unstructured data, and the poor quality of most of this information, meeting the deadline for this request is nearly impossible.
This scenario demonstrates why businesses are struggling to gain actionable insights from their data. In an age where making accurate business decisions quickly is critical to business success, having to wait for weeks, or even months, to get the needed insight can sabotage even the best laid-out business plans.
But why is data preparation so time consuming and complicated?
Traditional data analysis
Traditional approaches to data preparation require data to be integrated from multiple applications (systems), typically developed and supported by different vendors or hosted on separate computer hardware.
Moreover, the disparate systems containing the original data are frequently managed and operated by different departments. For example, a CRM system may combine data from sales, marketing and purchasing, creating further delays in extracting the needed information. Other critical information may be held by partners, suppliers, social media applications, and other external sources of data in an array of formats and of questionable quality.
Once the data is extracted, it needs to be transformed for storing in the proper format or structure for the purposes of querying and analysis. Then it needs to be loaded into the target database to be used for analysis by the data analysts.
This process, known in the IT industry as ETL (Extract, Transform & Load), is at the heart of traditional approaches to data preparation and could take weeks or months to complete.
What is even more frustrating is that once the data is loaded onto the system, there could be duplications, errors and other inaccuracies that can undermine the data insight generated from the findings. This significantly impacts the ability of business leaders to make data-driven decisions.
Is your data complete?
While 'Big Data' offers tremendous potential to deliver unprecedented insights into markets, customers and operations, businesses need to ensure this data is trustworthy and complete. The problem is that the vast majority of decision makers know they can't trust the quality of the data, or the way it has been manipulated, before it's presented to them as 'intelligence'. As a result, accurate data-driven decision-making remains elusive for many organisations today.
This is not a new problem. Yet these data quality and data preparation issues often remain overlooked by organisations. Why?
The truth is that the process of determining which data is required for analysis, pulling it together and preparing it for analysis is extremely complex and time consuming for large organisations. For instance, responsibility for data access, ownership, storage, quality, compilation, preparation, and analysis is generally scattered across multiple organisational functions, making it difficult to locate and extract the right information.
Moreover, the sheer detail and complexity of large internal and external datasets used by an organisation, coupled with a widespread lack of data governance, quality, and management processes, make it effectively impossible to achieve any meaningful collaboration between these owners.
And finally, traditional technologies do not support the efficient processing of large or increasingly complex datasets that contain both structured and unstructured data. This makes data preparation a very painful and time consuming process.
Empowering data analysis
But what if businesses didn’t have to go through the lengthy and complicated process of traditional data preparation? What if they could accelerate access to data insight to drive smarter decision-making and innovation?
This could be achieved by bringing together data preparation and data quality capabilities into a single platform, eliminating the need for ETL processes and automating all steps required to turn data into highly trustworthy intelligence.
Imagine being able to empower data analysts to search for data across the enterprise, join data sets, normalise, improve, filter and transform data all through point and click interfaces. This approach will enable data analysts to rapidly find and combine the data they need and use data visualisation tools such as Tableau and Qlik to present it to business executives. This way business analysts will be able to focus on generating data insight, without being dependent on IT, while their IT colleagues will be able to focus on managing the systems that hold or run on data and setting appropriate controls for accessing data.
Building on the example I cited at the beginning of this article, if a retail organisation’s CMO can accurately identify where their most valuable customers live and shop and what products they buy, she can target them with compelling, personalised offers to drive increased sales. Furthermore, if the CMO can link sentiment on social media with the correct customers, she can harness unfiltered, real-time feedback on products, shopping experience and more to help shape her campaigns and business strategy.
As this example illustrates, there is significant business value that could be realised in accelerating access to accurate data insight and gaining a unified view of your customers. This can help drive increased customer loyalty, greater revenue, improved operational efficiency, and a competitive advantage in the market.
Whatever part you play in the data analysis process, if you’ve encountered some of the data preparation challenges described in this article, I strongly encourage you to re-group with your colleagues and re-evaluate your approach to data preparation and data quality.
Ed Wrazen, VP Product Management, Big Data at Trillium Software
Image source: Shutterstock/tonefotografia