The data processing landscape has seen huge changes since 1995, in May 2018 the EU is replacing the Directive with a new regulation, the General Data Protection Regulation (GDPR). Enforceable from May 2018, UK-based organisations have had to take account of their responsibilities under the DPA for many years now. Many have mature and well-considered data management policies in place that already address elements of the GDPR. Nonetheless, with the threat of significant penalties for data breaches under the GDPR it would be prudent to reexamine procedures and to consider how these can be enhanced to ensure compliance when GDPR comes into effect in May 2018.
The truth is that there isn’t one single solution that will manage all of the demands the GDPR is putting onto business. The tools that are available fall into two categories: vendor specific or vendor neutral.
Each of the many different DBMSes in use provide data encryption features to ensure data is encrypted at rest as well as inflight across the network. This is a very important defence for any organisation to deploy across its live data platforms, but is the minimum protection expected under the GDPR.
DB2 and SQL Server have an in-built data masking capability that can be used to mask the view of specific data for any individual user. Oracle has its Oracle Virtual Private Database technology to control access to defined pieces of data, which can be particularly effective if used with the Oracle Label Security option. In addition to this, Oracle Advanced Security enables onthefly redaction of data before it is presented to users. These are effective data centre tools as you prepare for GDPR.
When it comes to creating a masked clone of a database, only Oracle has a utility to do this. Oracle DataPump Export uses the Remap_Data option to mask specified data as it is being copied out of the source database. IBM partner UBS Hainer supplies an export utility BCV5 for z/os based DB2 databases that will mask and redact specified data as it is being copied from the source. Similarly SAP partner EPI-USE supplies a tool that will mask data in SAP, including cloned data.
Most organisations typically use more than one type of data store. Even if only one DBMS type is deployed, it is likely there is also data stored in other file types.
Here are some of the more prominent data masking tools available that can be deployed across heterogeneous data sources:
Oracle Enterprise Manager, when combined with Oracle Gateway technology to access DB2, SQL Server, Teradata, SAP ASE and Informix databases, enables masked clones of a database to be created. One drawback with this approach is that data has to first be staged into an Oracle database.
IBM Infosphere Guardium provides a mechanism that masks sensitive data in real time from multiple heterogeneous data sources, including DB2, Oracle, ASE, SQL Server, PostgresSQL and others. Essentially calls to the data source are routed through a Guardium appliance where any sensitive data is transformed before it is presented to the user. From the same stable, Infosphere Optim enables data masking for test data creation within an overall solution for test data management.
There are other niche players in this field, such as Net2000 whose Data Masker technology is currently available for Oracle, SQL Server and DB2, with the promise of future support for file systems and for SAP ASE. All of these can be used to establish full-sized and masked copies of production data for testing purposes, although there will potentially be associated higher costs in terms of time and storage.
Delphix offers a heterogeneous data masking solution that is fully integrated with a data virtualisation technology. This tool supports consistent data masking across multiple DBMS (Oracle, SAP ASE, DB2, SQL Server, MySQL and PostgresSQL), but also across non database file systems. Because it is virtualising data, the time and storage space required to create a fully masked copy of production data is greatly reduced. The same solution can also be used to produce masked production copies of data in support of “breakfix” activity, or pseudonimised analytical activity.
Which is best, vendor specific or vendor neutral?
The short answer to this question is “both”. Although it sounds as if I am sitting on the fence, each vendor caters for different scenarios and compliance with the GDPR will demand an overall solution that brings together different technical and procedural responses.
DBMS specific tools are very well suited to protecting data at its source in production, particularly data encryption. Encryption itself, of course, is something that can also be implemented at the storage layer, for example with EMC VNX2 or Oracle ZFS. Data encryption at rest and in flight is the minimum protection that should be applied to secure personal data. However, while encryption is a valuable technique, it does not address all the obligations arising from the GDPR. Worse, it could be an unnecessary overhead in some nonproduction environments, which should only contain pseudonymous or otherwise anonymous data.
DBMS specific tools are also useful for masking data or otherwise shielding data so only authorised individuals can access the data in a predetermined way. This can satisfy the GDPR requirement to protect data by design and by default, especially in live applications. It can also be an effective way to counter the risk of true data values being accessed by off-shore systems support teams.
Due to the demand for production-like data for testing, it is important to find the right tool while remaining compliant. Any of the solutions mentioned, and others, can support the creation of fully masked copies of production data sources.
There is an advantage if the tool used can be deployed regardless of the source data platform as this will facilitate consistent masking across data sources. Where the proportion of data in nonproduction environments is high, effective management (along with appropriate pseudonimisation) of the data goes a long way to covering the risk of non-compliance. Above all, whichever solution you chose has to satisfy the demands of the GDPR as well as being affordable and manageable.
Image Credit: Wright Studio / Shutterstock