Skip to main content

Privacy lost: Reconciling analytics and risk

(Image credit: Image Credit: Pitney Bowes Software)

Organisations have long struggled with the question of how much data to keep. For business intelligence and analytics, more data is better. However, for legal, compliance and privacy professionals— who deal with risk—it’s much less clear. There are conflicting incentives to delete as much data as possible in order to mitigate exposure to breaches as well as to maintain data for eDiscovery and regulatory purposes.

In years past, these conflicts have often ended in a stalemate. But with the General Data Protection Regulation (GDPR), the most comprehensive and consequential data privacy regulation in history on the horizon, the stakes of the game are higher than ever: up to 4 per cent global annual turnover. With data volumes and the cost of being wrong only ever increasing, the essential question for analytics and risk stakeholders has become: “How much do we keep?”

This question doesn’t always have a simple answer. Rather than simply weighing the risks and rewards of maintaining or deleting certain types of data, it can be helpful to take a step back and examine the primary obstacles to effective analytics and compliance with GDPR and other regulations. Whereas GDPR mandates data privacy and protection, analytics is about reaping value from data. What we find underlying each of these drivers is a common theme which transcends privacy or insight: control. Control over file content, control over how data is stored, and control over how it’s processed and managed.

Data privacy

While increasing organisational control over data may seem counterproductive to privacy, the most intrusive appearing system is often the most private. If this sounds strange, consider the following.

Without initial intrusion into the actual content of all enterprise documents to inform access restrictions, remediation, and retention policies, how does an organisation ensure someone from Marketing isn’t misguidedly viewing confidential information created by an executive? Or that HR information isn’t accidentally drifting around an unrestricted location?

An approach to data privacy that provides insight into every enterprise document—even if it contains personal information—enables an organisation to limit its access to only those who need it.

In today’s complex enterprise environments this is a lot easier said than done, however many organisations have begun to implement a holistic approach to information governance that accounts for the many silos that occur across an enterprise and the many functions for which they are used. Single-instance storage, a hallmark of this strategy, solves many of the issues associated with data silos such as disjointed retention and inconsistent searches.

Under this unified approach, compliance, legal, privacy, and records management professionals have the same view of data, because there’s only one true copy. Data is classified the same way that organisations typically treat strictly business records, so appropriate access privileges can be applied and only the right people see the data they need. By unifying each of an organisation’s data silos, enterprise data becomes searchable from a single point.


Such a system also dramatically streamlines many processes necessary for GDPR compliance such as subject access requests. As per Article 15, a data subject may request access to their personal data which the organisation is required to produce. Once we reach May 2018 and requests for data start pouring in by the dozen, the costs associated with identifying and remediating personal data may rise by orders of magnitude.

Keep in mind that individuals won’t be asking for their data from just one repository; they will be asking for everything. Are organisations going to search each repository individually? Likely not. The ability to centralise access to various data stores will be essential to organisations handling large volumes of such requests.

Likewise, how does an organisation know that when they delete data in one location, every copy is deleted everywhere? To further the issues, how will organisations coordinate disparate GDPR compliance functions with eDiscovery, records management, etc.?

This list of issues raised by GDPR goes on and on, each obstacle pointing to the fact that many organisational approaches to data management face an architectural flaw: silos. Data lies in silos across today’s enterprises, and most organisations have little meaningful insight into them, let alone an effective process for searching across them. Managing across these disparate silos is a significant challenge, and GDPR is likely to expose problems associated with current approaches.


Analytics initiatives have long faced a data control issue as well. Traditionally, when data is needed for analytics, it’s taken from its original location and exported to yet another silo. Organisations typically can’t take the whole “beach” of data, so they take a sample. I refer to this as the sandbox approach, and it’s full of issues.

Because only a portion of data is actually being analysed, the accuracy of findings cannot be guaranteed. Additionally, data scientists must often spend large amounts of time cleansing the data, which can be a costly and time-consuming task. By the time data is cleansed, exported, and analysed, the significance of the results may have already grown stale. Even the most extensive analysis is valueless if it’s too late.

Furthermore, organisations must concern themselves with the nature of the data they’re exporting, and if it’s suitable to be exported and analysed. A worst-case scenario in analytics initiatives is to misguidedly export confidential data that should be restricted from processing, then have to deal with resulting security and regulatory issues that may arise.

In contrast, a method of raking the whole beach—not just the sandbox—can save an organisation massive costs and yield insights that would have otherwise escaped, all without the typical time it takes to export data. When organisations have enterprise data unified under a single platform with built-in analytics capabilities it solves many of the associated issues: There’s no guessing, no sampling error, no delay or extra effort to collect, move and scrub the data, and the data is always up to date and complete.

Ultimately, organisations want ground-breaking insights into their enterprise while staying compliant with various data regulations and maintaining privacy of sensitive information. These things don’t have to be contradictory. With a unified approach to governance, they go hand in hand.

Kon Leong, CEO and Co-Founder, ZL Technologies
Image Credit: Pitney Bowes Software