Skip to main content

Bridging cross border data sharing post-Brexit with simulated data

data
(Image credit: Shutterstock / whiteMocca)

Sharing data, be it customer, citizen or employee data, is central to business as well as government operations. However, following Brexit, many businesses at present are caught in a state of regulatory limbo combined with what can only be described as a challenging future for data sharing between the UK and EU. Therefore, likely inciting businesses to hit pause on data innovation projects or stop them completely. Some businesses may even choose to forgo establishing partnerships with organizations in the UK (or the EU if they are based in the UK) as a result. With innovation and strategic partnerships on hold, affected organizations risk losing their competitive edge altogether; failing to deliver products and services that meet the needs of their customers, both domestically and beyond.

In such a predicament, what can businesses on both sides of the channel – or globally, for that matter - do? How can organizations respect data privacy regulations whilst continuing to collaborate and innovate across borders?

The challenges imposed by Brexit

Through the Trade and Cooperation Agreement, the transfer of personal data between the UK and EU has been allowed to temporarily continue unhindered, post-Brexit. Originally permitted until the end of April 2021, the agreement has since been extended until the end of June 2021. After this period, the UK will face one of a handful of scenarios, though we explore two of the most likely. 

The first scenario sees an adequacy decision approved by the European Commission, recognizing that the level of data protection offered by the UK is on par with the EU’s standards; thus, permitting data to flow unrestricted. In the second scenario, the adequacy decision is denied and EU organizations looking to transfer data to the UK will be compelled to find another way to make transfers lawful which could include signing up to Standard Contractual Clauses (SCCs), or Binding Corporate Rules (BCRs) for multinational corporations with UK entities. This is a lengthy and costly undertaking, particularly in the wake of Schrems II, which places the onus on the data exporter to conduct due diligence and provide additional data protection safeguards where necessary.

At first glance, one might hope for the former as it offers, what appears to be, a more positive outlook; yet, in fact, it is merely the lesser of two evils. Indeed, even with approval from the European Commission, the draft adequacy decision would still have to pass through a committee of representatives from EU member states. If successful, the ruling itself may only be valid for four years and throughout this period, the UK would remain under monitoring and review as well as subject to challenges from external parties including privacy activist groups. As such, the UK would need to maintain acceptable data protection standards or risk the European Court of Justice invalidating the ruling; a very real possibility in view of the UK’s mass surveillance legislation.  There’s a possibility too that the European Commission could also terminate the deal early if the UK doesn’t meet its commitments. Should the UK succeed in meeting the EU’s benchmark, a strain may be introduced when seeking trading partnerships with other countries that may demand a loosening of standards. Either way, the post-Brexit era of data sharing is shrouded with uncertainty and complex processes. In other words, the perfect formula for stifling innovation.

The power of simulated data 

Fortunately, such regulations only apply to personal data, which presents somewhat of a loophole. As long as personal attributes and identifiable information are removed from the equation, the data in question is free to be shared wherever and whenever, without risk. This is exactly the benefit of simulated data: it offers an avenue for businesses to collaborate efficiently for testing and development purposes.

Simulated data leverages artificial intelligence to imitate the characteristics and behaviors of the original data, but with all sensitive attributes removed; making it impossible to re-identify individuals. This is unlike traditional anonymization techniques which might see data shuffled or obscured with ‘noise’. As the original data continues to exist within the dataset, these methods could lead to 1-to-1 mapping.

When using simulated data then, organizations can essentially bypass the vast majority of regulations when shared and manipulated. Rather, GDPR would only apply during the initial stages of data conversion. And even then, meeting regulations can be considered manageable because the data conversion can be done in a controlled environment, such as a data clean room, with limited access privileges. In the case that the simulated data is not derived from personal data, the data would be considered to go beyond the scope of GDPR altogether. The result is data and data products that can be used at will, whilst preserving the individual’s right to privacy.

The EU’s strict rules for Artificial Intelligence 

Granted, in light of the EU’s recent announcement which imposes new regulations on artificial intelligence, skeptics may jump on the opportunity to denounce simulated data as a solution to our regulatory woes. And it is true, the design and use of data products will come under mounting scrutiny. According to the recently proposed regulations, the use of artificial intelligence will be broken down into four key risk categories: Unacceptable (Anything that poses a clear risk such as social scoring), High Risk (Rules applied to decisions made in critical infrastructure, education, employment etc.), Limited Risk (e.g chatbots) and Minimal Risk (e.g video games and spam filters).

In short, because of the huge impact it could have on our lives, any organization creating and using data products will be under significant pressure to ensure that their AI systems are trustworthy, transparent and free of bias. As we know, AI is only as good as the data that it is trained on. So, to meet the EU’s latest regulations, organizations must meet two core conditions: use high-quality training data, and possess a means of explaining how decisions are made from AI algorithms. As it happens, artificial intelligence can support both of these demands.

High-quality data means complete, accurate, consistent and up-to-date data. Artificial intelligence and machine learning have the capacity of helping data practitioners to achieve this. It can assess datasets to spot where data might be incomplete. It can determine the fairness of your datasets, by pinpointing instances of bias and more importantly, mitigate them by rebalancing underrepresented classes. It can also account for previously unaccounted for scenarios to make up for a lack of relevant historical data or to plan for new use cases. 

The latter point, on the other hand, speaks to the need for what’s been termed, ‘Explainable AI’. Often, AI takes a load of data into a black hole, runs a number of complex calculations and algorithms before spitting out some results. What the calculations and algorithms are, and how it led to a specific decision, remains largely cryptic and obfuscated to humans. In order to build responsible AI in line with our values, we need visibility into our models. The useful feature of employing simulated data platforms is that it offers us some of this visibility. Adjustments can easily be made through the removal or generation of data for each attribute. By experimenting this way and examining the changes in output, organizations can gain greater clarity into what drives their model’s decisions.

As we near the end of the transition period, organizations in the UK as well as the EU will have to contend with significant challenges to data sharing. To complicate matters further, the EU has recently introduced important regulations to the use of AI and data products. Our world today increasingly values data privacy as well as the responsible and fair use of AI; this is a great thing, a development to be celebrated. Nevertheless, it does present new considerations that organizations must take into account moving forward. Whilst there is currently no silver bullet solution for a quick and easy transition into trustworthy, ethical AI or a clear path through an ever-evolving regulatory landscape, there is a place for simulated data in our next steps.

Nicolai Baldin, CEO and Founder, Synthesized

Nicolai Baldin, CEO and Founder, Synthesized