Skip to main content

Make your data lake work for you

(Image credit: Image source: Shuttterstock/Bruce Rolff)

First there was data. Then there was big data, and data lakes. Data lakes refer to high volume storage of any structured and unstructured data. Holding raw data in its native format for later use has grown in popularity, and is being explored in big data initiatives. Data flows from various streams into the data lake and is available to data scientists to analyse and interpret patterns for predictive analytics and machine learning.

You may have heard that data lakes are essential to capitalising on big data opportunities, but most have little understanding of how to get real value from it. The goal of collecting such large amounts of data is to help the business make timely, data-driven decisions, uncover new opportunities and mitigate financial and compliance risks. 

In a competitive world where every bit of data matters, the data lake is an appealing concept. As digital transformation fuels the adoption of social, mobile and omnichannel interactions, and the Internet of Things (IOT) boom looms, many enterprises have jumped on the bandwagon, created Hadoop-based repositories and started filling them with all kinds of data from various systems. 

They may have even poured millions of dollars into their data lakes, and taken months getting teams on board, along with hiring data scientists, Hadoop experts and others, only to grow wary of the new technologies added daily, while adding to cost and time needed to re-train staff. Finding the business value from the data lakes is yet to be realised.

Making your data lake work for you relies on more than just redirecting data flow to the Hadoop cluster. Without a strong understanding of the data quality, traceability, and proper data governance, using a data lake can be a risky endeavour. Without proper metadata management and data quality checks, over time the data in lakes becomes unusable.

The following are essential for gaining business value from your big data initiatives.

Reliable and quality data

Hadoop-based big data platforms and stream processing solutions are being used to build and maintain data lakes. However, one can’t tell whether the data is useful until it’s analysed.

Even if enterprises use sophisticated tools to analyse and visualise big data, there’s no guarantee that the answers are reliable. This is mostly due to incorrect, incomplete and inconsistent data and the lack of correlation back to accurate master profiles and operations.

Getting value from your data lake comes down to the balance among governance, security and reliability of the data. When creating the data lake, organisations need to put data management principles and processes in place to improve data reliability. Sufficient metadata and quality assurance of data also need to be included to help provide context for users. In many industries, understanding and maintaining data lineage to tie each data attribute back to the originating data source is important and also required for compliance purposes.

Enabling all users to derive value from data lakes requires building a comprehensive data strategy. A Modern Data Management platform addresses many of these challenges by giving organisations the ability to quickly match, merge, cleanse and relate data entities to create a reliable data foundation. It helps you bring together master data, and big data across all sources and formats, creating golden records of any data entity.

Additionally, you will be able to blend your master data and big data across internal and external systems, third-party data subscriptions and social media sources, as well as provision the reliable and usable data back to your business applications and big data analytics platforms in a closed-loop, putting your big data to sound business use.

Data-driven applications for decision management

Gartner’s “The Data Lake Fallacy: All Water and Little Substance” states that "The fundamental issue with the data lake is that it makes certain assumptions about the users of information.” While these assumptions may be true for data scientists working with data, the majority of business users lack sophistication or support to derive any demonstrable business value.

For maximum value, the data lake needs to feed relevant insights and recommended actions specific to the frontline business user making decisions. Offering big data insights personalised to the frontline business user’s daily operations helps them take the right actions, based on accurate information, significantly improving productivity and outcomes. 

One way of delivering this is through business-facing, data-driven applications that offer relevant big data insights for decision management. Data-driven applications create comprehensive pictures of business entities, such as customers, products, places, channels and activities by combining cleansed data from all sources and revealing relationships across these entities. 

Understanding the complex relationships across all your data entities is important. Utilising graph technologies helps identify and visually reveal relationships among the people, products, places and activities your business cares about. You can now use data-driven applications to focus on the most valuable products, biggest opportunities and most influential customers.

Data-driven applications allows business users to work with role-specific applications that bring data together and include insights relevant to the task at hand, helping users make better decisions. These applications present users with important relationships such as detailing households with all family members or key influencers in an organisation to understand various affiliations among people, products, and places. With consumer-friendly interfaces, these also guide users with intelligent recommendations for next best actions and suggestions to improve the quality of the data. Providing the reliable information with insights and recommendations delivers the business value that organisations are looking for. Beyond that, a closed-loop between analytics, big data and operational applications makes sure there’s continuous feedback to improve operations as well as data quality.

When properly managed and actively used, your data lake becomes much more than a large collection of data. Through Modern Data Management platforms and governance, you can build the master golden record repository where your master profiles, big data, business applications and analytics are no longer disconnected. Your operational applications and analytics get access to reliable information, and closed-loop feedback ensures that your data is always clean, current and complete.

Ajay Khanna, Vice President, Marketing, Reltio
Image source: Shuttterstock/Bruce Rolff

Ajay Khanna
Ajay Khanna is the Vice President, Marketing at Reltio. Prior to joining Reltio he held senior positions at Veeva Systems, Oracle and other software companies including KANA, Progress and Amdocs. He holds an MBA in marketing and finance from Santa Clara University.