Skip to main content

Data lakes without governance are swamps – Four steps to make them crystal clear

(Image credit: IT Pro Portal)

Imagine you work in IT for a multi-national organisation that received poor customer satisfaction ratings during its annual survey.

Now, the CEO’s top priority (and, therefore, your top priority) is to fix the customer experience.

But how?

With an abundance of customer data generated by dozens of business systems and applications – including procurement, call centre interactions, website visits, mobile app usage and, increasingly, IoT sensors and devices – it should be possible to determine what customers want and to deliver a great customer experience.

The solution seems straightforward – pour all the data into a data lake and turn your data scientists and business analysts loose to instantly work their magic to discover fresh, actionable insights that will drive a more customised, personalised customer experience.

But the process is rarely smooth sailing.

The core problem is that the abundant forms and amount of data are not instantly ready for use. The complexities associated with adequately preparing data for modern analytics exacerbate the problem, and the challenges increase as the data continues to grow at massive scale and real-time speed.

As a result, data scientists are spending too much time on data cleansing and preparation, after the fact, rather than value-added analysis and strategic decision-making. This situation is not sustainable.

Many organisations now realise that in their hurry and zeal to build a data lake, they actually created a “data swamp” worthy of a Swamp Thing comic book nightmare.


Because they overlooked a critical success factor – data governance, which brings together the people, policies, processes, and technology needed to transform a data swamp into a crystal-clear data lake that becomes a trusted and valuable business asset.

Data governance: What is it?

Data governance is the management of practices and processes that ensure the quality, availability, usability, integrity, and security of enterprise data assets – both on-premises and in the cloud.

It is supported by a rich technology stack that includes components for business glossary, policy definition and collaboration, data lineage, data integration, data quality, metadata management, security and privacy, data masking, and data cataloguing, which classifies information, identifies relationships, and enables end-user searching. All of this sounds like a lot but with artificial intelligence (AI) and the right technology solution, you get an automated approach to data governance with trusted, contextual, and accessible data.

While the right technology foundation is critical, data governance also requires collaboration across a number of stakeholders, including chief data officers, enterprise architects, business unit leads, and data stewards responsible for overseeing data quality, compliance with data policies and procedures, and day-to-day governance activities.

Line of business owners and IT must work in close alignment to understand the other’s roles and objectives, or they run the risk that data governance policies and processes won’t be operationalised. The most effective programmes make data governance a part of the business culture. It’s not a standalone discipline – it’s just how the business is run.

Four steps to data governance for big data

By following these four steps for data governance, organisations can finally realise the tremendous potential their data lakes hold to drive activities across the entire enterprise – whether that’s supporting better customer experience, fraud detection, risk management, operational efficiency, or other business goals.

  • Bring order to chaos. Embrace a data governance strategy to ensure that information about data and systems is well organised, classified, and catalogued, as well as described in a common business language. That helps end-users more readily understand data’s meaning, context and relevance, while eliminating the chaos of raw structured and unstructured information in a data lake.
  • Make data analytics-ready. Make data quality a key part of your data governance programme so that information is consistent, accurate, trustworthy, and suited for analytics. Data governance efforts should also help ensure that data is accessible only to authorised individuals, and that it’s secure and compliant with regulatory requirements, especially around data privacy.
  • Enable self-service. Equip business users and data scientists with self-service tools and semantic search capabilities that let them “shop” for data (as they might do when searching for products on a retail website) and apply faceted search to narrow down results. Owners of data within organisations will only make data accessible across the enterprise if they are sure it is secure and complies with corporate policies and industry regulations. Only then can data be “democratised.” By democratising data lakes and making information usable by the business, organisations will achieve payback many times over.
  • Be a catalyst for data-driven business value. Look to capitalise on data governance as a catalyst for accelerating digital transformation. Trusted, actionable information in a data lake can help you find efficiencies, solve complexities, fuel innovation, create competitive advantage, and better comply with data privacy requirements such as the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act.

Crystal-clear data lakes need data governance

The role of a modern data lake is to be a single, trusted repository of historical and real-time information that drives smarter, faster decision-making across the enterprise. Data governance is critical for ensuring data is consistent, accurate, contextualised, accessible, and protected.

With a crystal-clear data lake, organisations are able to capitalise on their vast data to deliver innovative products and services, better serve customers, and create unprecedented business value in the digital era.

Jitesh Ghai, SVP & GM, Data Quality, Security and Governance, Informatica (opens in new tab)
Image Credit: IT Pro Portal

Jitesh Ghai heads up Informatica’s Data Quality, Data Security and Data Governance offerings. Previously, he led its Strategy and Operations group, responsible for organic and in-organic strategy.