Skip to main content

Bringing down the house - The risky choice of using in-house anonymisation

(Image credit: Image Credit: IT Pro Portal)

As the first anniversary of the application of the GDPR approaches, one hopes that organisations have become aware of their responsibilities as controllers of personal data. 

One critical area is the difficulty of carrying out anonymisation in-house which supervisory authorities have frequently stated falls short of the high threshold for anonymisation set by the European Data Protection Board.

In large enterprises, where data-driven insights inform business strategy, data controllers will often take on the responsibility for de-identifying their customer data with the aim of using the datasets for analytics unconstrained by the requirements of GDPR and other data protection laws.

The intent to preserve privacy is admirable, however the execution is frequently inadequate and, as such, those organisations may leave themselves exposed to regulatory action, fines and perhaps most crucially, reputational damage leading to a customer base that has lost trust and faith that the company treats them as valued customers, not as products.

The key concept to appreciate is that anonymised data falls outside the scope of “personal data” as defined in the GDPR.  So by anonymising customer datasets organisations can conduct analytics and not be constrained by data protection principles, such as limits on data collection, retention, purpose-based consent, the right to withdraw consent at any time and so on.

The difficulty with in-house anonymisation arises because internal processes are frequently flawed and organisations are not aware of the high standard of anonymisation that both the GDPR and the national supervisory authorities expect in order for personal data to be considered legally anonymised.

In order to establish if the level of anonymity is adequate, organisations need to objectively demonstrate that they have taken into account “all means reasonably likely” to be used by the controller or a third party to identify someone, directly or indirectly. This is a high threshold and difficult to achieve.

The risk of re-identification must be at an insignificant level, otherwise the process will be considered to have failed to anonymise the data and that organisation’s compliance failure is potentially extensive given the large number of data subjects whose personal data is in that case being processed unlawfully.

Pitfalls of in-house processes

The key problem with anonymisation that is conducted in-house is that the original data set is still retained by that organisation. Direct and indirect identifiers might be removed from ‘Customer Dataset A’ to create ‘Anonymised Dataset B’, however a dataset will be unlikely to be considered anonymised where a controller retains both the source data and the modified data. This is because when the original dataset in the hands of the organisation results in that company having the means to re-identify an individual in, or the entirety of, the dataset. 

On this, the Irish Data Protection Commission has explicitly stated in its guidance on “Anonymisation and Pseudonymisation” that “[i]f the source data is not deleted at the time of the anonymisation, the data controller who retains both the source data and anonymised data will normally be in a position to identify individuals from the anonymised data. In such cases, the anonymised data must still be considered to be personal data while in the hands of the data controller, unless the anonymisation process would prevent the singling out of an individual data subject, even to someone in possession of the source data”. The latter standard is both mathematically exceptionally difficult and almost impossible if any reasonable utility in the data is to be retained for analytics.

Neither is outsourcing analytics or anonymisation to a third party processor necessarily the solution. WP29 Opinion stated that where a data controller hands over part of a dataset without deleting the original identifiable data at event level, the resulting data set is still personal data and such data “would still qualify as personal data for any party, as long as the data controller (or any other party) still has access to the original raw data”. In any event the potential risk of re-identification remains when the analysed data is returned to the original controller unless consideration is taken of the re-identification risk in the analytic output. There is therefore a significant risk that in-house anonymisation or anonymisation conducted by a third party, where the company retains the original dataset, does not constitute adequate anonymisation within the terms of the GDPR or in the expectations of the supervisory authorities. 

While GDPR has certainly forced data controllers to raise their game in terms of data stewardship, there is still much work to be done by many organisations to meet the GDPR compliance requirements.  This is particularly the case in terms of organisations approach to achieving anonymisation.  There seems to be a lowest common denominator approach to a very technical and complex problem. Controllers have in the past relied on removing simple identifiers and were of the view that this would achieve anonymisation. It does not. 

Failure to successfully anonymise is not theoretical. There has been considerable coverage of high-profile examples such as the Massachusetts Group Insurance dataset, the Netflix Prize dataset and the AOL dataset, however it has also featured in European supervisory authority investigations. Investigating the personal data processing of Microsoft’s Windows 10, the Dutch Data Protection Authority concluded in 2017 that Microsoft did not clearly inform users about the type of data it used and for which purpose. It found that the data subject to aggregated analysis was not anonymous as Microsoft retained identifiable personal data in its cloud storage.

Inadequate anonymisation is a GDPR compliance “accident” waiting to happen for the many data controllers who think they have nullified customer consent requirements by deploying with anonymisation techniques.  The technical and organisational nuances to achieving the high threshold for anonymisation appear to be ignored. A failure to raise standards in accordance with the change in the law means supervisory authorities will start looking closer and investigations and regulatory action will inevitably follow.

André Thompson, privacy and ethics counsel, Trūata
Image Credit: IT Pro Portal