Skip to main content

Governing big data analytics for GDPR compliance

(Image credit: Image Credit: Flickr / janneke staaks)

The new EU law called the General Data Protection Regulation (GDPR) changes the way entire organisations interact with personal data, and thus big data analytics. But more than that, it offers an opportunity for enterprises to change the way they approach governance capabilities. 

While the growing inventiveness and sophistication of cyber security threats and attacks has made compliance and security top of mind for most organisations, the GDPR heightens this focus. Under the GDPR, violations of record keeping, security and breach notifications can result in fines equal to two percent of an entity's global gross revenue. This offers companies no other choice but to take notice and absolutely ensure compliance.

However, big data complicates the process of maintaining compliance for GDPR regulations (opens in new tab), as well as other privacy rules. There is a tremendous volume of data transformed into different intermediate forms, and used in different ways. GDPR guidelines apply to all of the data that is gathered throughout the big data analytics ecosystem, whether it is willingly provided by customers or gathered by automated systems. This includes PII data stored and used in data lakes and big data analytic platforms. Each of these aspects must be managed, tracked and reported. 

Managing of all this data may seem like an insurmountable task. However, with a comprehensive governance plan, organisations can enable trust and confidence in their data and drive faster, more collaborative analytics processes.

Four aspects of GDPR governance  

A combination of people, process and tools is required to effectively govern private data and meet GDPR compliance (opens in new tab). These are intertwined into four key aspects of GDPR governance. 

1. Discover

The discovery process is critical to identify all characteristics of the private data that is managed under GDPR compliance. This requires extensive exploration of data assets to understand if any rights (consent) have been given to use the data.

The data exploration process is far broader than simply identifying the private, personal data. It also includes identifying:

·         How it is used (or will be used) — Seeing how the data is transformed, what processes use the data or derivatives of the data and what actions are taken because of the data.

·         If consent is granted — Determining if the person gave consent to use the data and in what manner they allowed use of the data.

·         Where it came from — Tracing the data back to its sources and how it was moved to different systems and different forms within the organisation. 

Governance does not play a heavy role at this stage, but big data discovery does. Your analysts will require advanced, easy-to-use data discovery tools to assess the state of the data and determine where to apply permissions. Traceable lineage will also provide valuable information on where the data came from and how it was transformed.

2. Secure and Govern

Once private personal data is catalogued, categorised and split, it can then be secured and governed. This will require applying different policies to the data in various forms and stages.

Securing and governing the data requires a number of critical capabilities applied as

needed, including:

·         Encryption and masking — Data needs to be fully encrypted at rest and on the wire (being transferred), and certain fields should be obfuscated so analysts do not see the data during analysis. 

·         Applying proper policies — A variety of different policies will be needed to determine what data can be seen by whom and how it’s used as it’s transformed through the analytic process. 

·         Flexible organisation methods — To potentially separate personal data, intermediate data sets and results for easy application of security and access control rules, flexible organisation is needed. 

·         Comprehensive cataloguing — The catalogue of information about private data in analytic platforms will need to be integrated in other IT control and metadata systems, where a more comprehensive view of all data is managed. 

·         Cover the entire information lifecycle — The data needs to be governed across its entire lifecycle, which not only includes where it came from but also how it was transformed and where it was used. 

3. Monitor and Manage

Monitoring and managing analytic pipelines involving private customer data can quickly become complex. With customer data used in many different ways (up-sell, cross-sell, retention, engagement and more), the data and how it’s used can quickly get scattered.

Getting a complete view of how your customer data is being used requires:

·         Data and artefact tracking — Being able to track both analytic models and resulting data, not simply data. 

·         End-to-end lineage — The ability to track the entire chain of data, analytics and results, explain each operation and identify changes over time. 

·         Deep monitoring — Being able to monitor every aspect of the analytic process including access to data, execution, use of the data and level of security applied. 

·         Data management policies — Setting rules controlling how data is managed and retained in the analytic environment to reduce risk of illegal access. 

·         Continuous updates — The ability to update customer data and how it’s used in the analytic processes based on new personal preferences and data. 

4. Comply

Complying with GDPR regulations requires proof that proper controls and processes are in place to secure the private data and use it properly in accordance with consent of individuals. With an ever-growing volume of data and increasing number analytics on this data, manual processes reporting GDPR compliance can become a large resource drain on an already taxed IT staff.

Smart organisations are consolidating information about GDPR processes in central repositories, cataloguing solutions or IT control systems. This enables an enterprise-wide view of all personal data, how it’s used and how it’s managed. This simplifies and streamlines the auditing and reporting processes for GDPR.

For GDPR success, secure your data 

Data is the key, for all of stakeholders inside and outside the firewall of the business.

It represents the untapped resource that will lead to new business opportunities as well as competitive advantage. Simultaneously, data is the ticking time bomb that can explode if not adequately secured, protected, governed and monitored.

With a great deal of personal data being used in big data analytics, it’s essential you choose a platform that provides the deepest functionality to ensure you’re GDPR compliant, while still lowering the administrative burn needed to manage compliance processes. 

John Morrell is Sr. Director of Product Marketing at Datameer (opens in new tab)
Image Credit: Flickr / janneke staaks

John Morrell is Sr. Director of Product Marketing at Datameer.