The European Commission says that the EU could become the most attractive, secure and dynamic data-agile economy in the world. The Commission’s new data strategy is for the EU to seize new opportunities in digitised industry and business-to-business artificial intelligence (AI) applications. However, the Commission has scrupulously avoided the vital question of whether GDPR is an obstacle to the EU’s plans to become an AI hub.
The European Commission announced its new EU data strategy with the publication of two papers in February 2020. These were a white paper on AI and a communication entitled, "A European strategy for data". The Commission acknowledges that “the availability of data is essential for training artificial intelligence systems … without data, there is no AI.” Since GDPR carefully restricts the uses to which data can be put, it could well cut across the Commission’s ambitions for AI.
To facilitate access to large quantities of data, the Commission is proposing the creation of European data spaces for sectors including agriculture, public administration, industry, transport and health. The Commission wants businesses to have “easy access to an almost infinite amount of high-quality industrial data.” However, GDPR is likely to frustrate at least part of the Commission’s plans.
The Commission presents the GDPR as an important EU achievement that will enable trustworthy AI to flourish. However, the Commission avoids asking the tricky question of whether the GDPR could in fact prove an obstacle to AI innovation. The only hint that GDPR might need to be revisited is a comment that an “upcoming review of the GDPR may provide further useful elements.”
GDPR represents the gold standard for data protection worldwide. Many countries in the world are emulating GDPR. There is little political appetite in Europe to roll-back GDPR. Yet GDPR does, in fact, create friction with machine learning. For example, several of GDPR’s core principles, including purpose limitation and data minimisation, are obstacles to the creation of large training datasets.
More flexible GDPR concepts exist which theoretically permit innovation, such as legitimate interest, public interest and scientific research. Yet these are so vaguely defined as to provide little legal certainty. The concepts can also be interpreted differently by the EU member states’ various national data protection authorities, which can mean different member states apply different rules.
How can the Commission best reduce the friction between GDPR and AI innovation? The first step is recognising that friction exists. Currently, EU policy makers are in denial about this reality. When challenged, they generally just repeat the mantra that GDPR supports sustainable AI innovation. For example, in its 2018 annual report, the French data regulator, the Commission nationale de l'informatique et des libertés (CNIL), said GDPR “provides a solid foundation for the future development of artificial intelligence in Europe".
While GDPR does have in-built flexibility, that flexibility is surrounded by legal uncertainty which makes it is hard to avail of in practice. A more realistic assessment is that GDPR creates an obstacle to many European machine-learning projects. For example, a 2019 report on AI in France specifically stated that "access to health data, necessary for the development of relevant algorithms, is the major current obstacle to the development of an artificial intelligence sector in France.”
Only by first candidly admitting that GDPR creates barriers can the Commission begin to find ways to reduce those barriers - while preserving the GDPR’s essential protections. The Commission could then take the lead by providing innovation-friendly interpretations of GDPR’s vaguer provisions. The Commission has previously issued guidance on the interpretation of EU law in a way that supports innovation. Article 173(1) of the Treaty on the Functioning of the European Union actually requires that the Union and member states create the conditions to encourage the competitiveness of the EU industry - including by promoting innovation, research and technological development. GDPR should be therefore interpreted in light of these overarching treaty objectives.
The innovation problem
The innovation problem has arisen in the past, as regards the Internet of Things, where the Commission issued recommendations in light of the 1995 Data Protection Directive. Similar recommendation could be issued on AI in light of GDPR constraints.
GDPR has at least three zones of uncertainty which the Commission could usefully clarify:
- The definition of personal data and how “special categories” can be used
There is still uncertainty regarding what constitutes personal data under GDPR. There is often debate about whether anonymised datasets are truly anonymised, since statistical correlations can be found between anonymised data and small groups, or individuals.
Given such uncertainty, machine learning innovators prefer to use public databases in the US to train their algorithms. The Commission could create a roadmap which creates large data spaces for training that include personal data. Excluding personal data is unrealistic. Data from medical devices, or cars, may be personal since it may relate to a single individual.
Large datasets of personal data may help to avoid bias and discrimination. Currently, most facial recognition algorithms are trained on non-European datasets and tested against racial discrimination in the United States. This is because GDPR is seen as discouraging such testing in Europe. The Commission could examine ways for GDPR to facilitate the creation of representative databases and testing mechanisms.
- What does processing for scientific research and statistical purposes include?
GDPR contains special exemptions on data use for scientific research and statistical purposes, but their definitions are ambiguous. Recital 159 GDPR gives a non-exhaustive list of scientific research activities, but some believe commercial research should be excluded.
The Commission should clarify how GDPR’s scientific research and statistics can be used to facilitate innovation in machine learning.
- What are “appropriate safeguards” for scientific research and statistical studies?
GDPR highlights the importance of data processing for scientific research and statistical purposes, subject to "appropriate safeguards for the rights and freedoms of the data subject”.
GDPR allows member states to derogate when processing personal data for such purposes. For example, the French authorities have specified appropriate safeguards for processing health data for medical research. However, as this only applies in France, cross-border medical research is hampered.
The Commission should work urgently to ease the friction between the GDPR and AI innovation. Recommendations and clarifications may help create a more innovation-friendly environment for AI. Yet some areas may require the modification of GDPR itself, or a new EU regulation.
The official pretence that no friction exists is profoundly unhelpful, and this denial of reality will only serve to obstruct the EU’s ambitious plans for AI.