It is no secret that as the world of data integration is rapidly changing, the task integrating data has also become increasingly complex. Businesses today are depending heavily on data analysis and real time information to make decisions, raising the stakes for data integration. The use of unstructured data, big data, departmental data, end-user data and external data all challenge old models for data integration.
In this new reality, the simple processes of extract, transform, and load (ETL) integration for structured enterprise data is no longer enough. For those us in the business of data management, it should come as no surprise that in a 2017 TDWI survey, 37 per cent of respondents cited difficulty in accessing and integrating all relevant data as a challenge to becoming a “data-driven” company (the gold standard).
What are the challenges?
As organisations continue to leverage big data and the intricacies of data grow, the ability to integrate data becomes even more daunting. Data no longer resides inside of an organisation - it is living in the cloud and across cloud platforms. New data types and compute tiers are adding colour to the diverse data fabric many organisations have in place today.
At present, data integration tools are pigeon-holed into the functions of moving and transporting data from one place to another. But moving and transporting data is the easy part - integrating data is the hard part. Most companies expect the tools to magically integrate data and get upset when the tools don’t meet the expectations.
So, what are the primary the road blocks to successful data integration?
- Data exists across all parts of an organisation and no longer resides solely inside of an organisation. It lives in the cloud and across cloud platforms, in different systems, with different schemas and with different data dependencies.
- The environment is diverse and complicated. Data enters the organisation in multiple places and is duplicated and copied across the ecosystem. Each system has a different owner, thus the data is created and managed differently. Information is accessed by many different users, all making changes to suit their needs.
Until data rises in the mind of business leaders to the level of a corporate asset, data will be viewed and used as a by-product of the business, ultimately keeping data integration as a sizable hurdle for organisations. Additionally, meeting modern data integration challenges calls for a solid data integration strategy and architecture - and accomplishing that is not a simple task.
How can AI help?
Is there any promising news here? Yes, I believe there is.
There is an abundance of optimism among information technology leaders and CIOs that the emergence of Artificial Intelligence (AI) and Machine Learning (ML) will drastically improve the both the processes and outcomes of data integration. A recent Enterprise Management Associates (EMA) report stated that “AI-enablement should be a priority for analytics leaders at all levels as it provides organisations with the ability to overcome the constraints of legacy or less-automated data processing.”
Simply put, the evolution of data integration tools will have to embrace AI and ML processes to assist with the integration of data based on past human decisioning. The tools will need to incorporate how a human (or a human-made process) integrated data in the past, learn from these decisions and apply the learnings to the data across the organisation.
The future promise
AI and ML (as well as IoT and graph technologies) will radically change the integration of data across the data landscape. The removal of highly manually efforts will transition data integration from being a one-way process into one that is multi-lateral. This means that data eventually will be able to integrate itself based on what it has learned and share its learnings with machines and man.
Likewise, we need to think beyond traditional data integration and drive smarter approaches by using statistical AI capabilities. For example, using frequency analysis to help identify outliers and missing values that can skew other measures (e.g., mean, average, median), applying summary statistics to help analysts understand the distribution and variance (because data isn’t always normally distributed, as many statistical methods assume), and employing correlation to show which variables or combination of variables will be most useful based on predictive capability strength.
So, what’s AI got to do with data integration? A whole bunch actually.
However, a word of caution: AI and ML technology will only realise its promised potential to the data integration process if built upon the foundation of a comprehensive data strategy program for collecting, connecting and managing data. And that strategy must be supported by data governance, data literacy and data management programs to effectively move data from a by-product to a business asset. Without it, all your data integration processes could quickly become chaotic, unbalanced or at worst, fail.
Kim Kaluba, Senior Manager for Data Management Solutions, SAS