Skip to main content

How not to build a biased algorithm

(Image credit: Image Credit: B-lay)

Are algorithms biased? And if so, what can we do about it? Facebook has just revealed an internal review to look at these two key questions.

It’s interesting that Facebook’s intentions in this area have come to light now.  Probably due to recent world events. There’s a lot of attention being given to algorithmic bias right now - especially when it comes to race and gender.

Are there systematic problems in the algorithms that shape what we see online and make so many crucial decisions in our lives - problems that perpetuate stereotypes, create unfair outcomes, and arbitrarily privilege one group of users over others?

This is an important question. After all, Artificial Intelligence (AI) and machine learning are being used far and wide today and involved in making life-changing decisions.

Which prisoner gets parole? Which areas are policed most heavily? Whose CV makes it through the electronic filter? Who receives a life-changing loan, or mortgage? What is shown to whom on social media channels?

Organizations of all sizes with an interest in AI or machine learning should pay attention to what is happening at Facebook. The social network - like other Silicon Valley giants hitting problems with alleged bias in their algorithms - is doing some of the most cutting-edge work with AI and algorithms.

They were the first to get into this area and are first to be forced to face these issues. And their vast user-bases amplify the impact of any algorithm bias. Facebook is dealing with huge volumes and diversity of content. It is at the sharp end and will be stung first.

The ‘Big Five’ companies (Google, Amazon, Facebook, Apple, and Microsoft) have all done a lot of the leg work that allows a greater range of organizations to make use of AI technologies. Many more companies are turning to off-the-shelf tools and resources to do things like building chatbots, extracting key information from natural language, and implementing real-time solutions involving image and video processing.

Custodians of knowledge

The problem is that these come pre-trained. If you are building AI using out-of-the-box features in Microsoft Azure, for example, then Microsoft’s biases become your biases.

Google and Facebook have also become the custodians of the world’s knowledge. Their algorithm-driven rankings and news feeds have the power to inform and manipulate people’s views on the world and change what they see.

That raises some big questions about algorithms. Most people would agree that the search engines and social media companies should be doing some filtering, at the very least to protect people from harmful content. But where do we draw the line?

Add to that, who should be governing them? Who should be responsible for removing the bad stuff? And who should carry the can if mistakes are made? The IT person, the data scientist, the tool provider, or the business leader pushing the use of AI technologies?

The recent exam grading fiasco demonstrated several of the challenges with AI bias, including failings around both transparency and accountability. Most peculiarly, it was the algorithm, rather than the people who built it, that took the blame. Few people blame the recipe and not the chef when presented with a poor meal.

We can’t eradicate bias in machines because it exists in the humans that build and use them, and the data they are trained on. We can, however, look to make those biases more obvious, measure them, expose them and try to overcome them as we find them.

The origin of algorithmic bias

There are six key ways in which bias makes its way into algorithms:

1. Through the data they train with.

An algorithm finds patterns in training data - large amounts of historical data points - and then uses what it learnt to predict what’s likely to happen in the future - for instance, will a person default on their mortgage?

But if this picture of the past is flawed, or unrepresentative of the data that the live algorithm will operate on, then its insights will be too. Garbage in, garbage out.

Other problems stem from us not thinking hard enough about the training data we use. For instance, if we ask an algorithm to look at re-offending rates, is that data robust? Or does heavier police surveillance of some groups mean they’re just more likely to be caught re-offending? If so, the algorithm perpetuates unfairness.

2. Hidden correlations.

Even if we don’t include protected characteristics, such as ethnicity, as one of the categories in the training data, racial bias can creep in through correlation.

Take a policing algorithm looking to work out which areas should be most heavily patrolled, for instance. Let’s imagine one key factor turns out to be whether an area has lots of empty industrial buildings. Fair enough?

But if black people disproportionately live in such areas, the algorithm will produce a biased and potentially ‘racist’ outcome.

3. ‘Accurate’ bias.

Sometimes the picture algorithms paint of how our world works are all-too accurate. Imagine we’re using AI to target ads for a £100,000 sports car. The algorithm might decide to exclude anyone earning under £70,000 a year because, historically, those people haven’t bought £100k vehicles. In doing so, it might well produce a mostly white, male audience for our ad.

The algorithm is doing its job. The problem is that, historically, women and non-white people have been under-represented in high-paid work. The algorithm holds up an ‘accurate’ mirror to our society - and we might not like what we see.

There are genuine problems here: it might be infeasible to eradicate social biases from algorithms while preserving their power to interpret the world.

4. Bias in the Eye of the Beholder.

Some of Facebook’s recent problems are about this issue. For instance, it has algorithms detecting hate speech and shutting down accounts when found.

These treat the phrase “men are trash” the same way as “women are scum”.

However, some would argue that this second phrase is much more offensive, citing the historical background of misogyny and women’s subordination. Facebook disagrees - for now. But many others consider this to be unjust.

5. Feedback bias.

If we are a white middle-class woman and we ‘Like’ or click through to an upmarket home-baking supplies shop, then more middle-class women will see these kinds of adverts. The algorithm simply learns and optimizes its output for each user.

The operation of the algorithm may create messages some don’t like, for instance by making generalized assumptions about who bakes biscuits. But it does so because of the messages that users are giving it.  

6. Black-box bias.

The most advanced areas of AI and machine learning - Deep Learning - produce incredibly complex self-developing algorithms able to predict some events with remarkable accuracy. It is, for instance, helping to identify people most likely to develop schizophrenia, a question that has baffled doctors.

The problem is that the algorithm becomes so complicated that no one knows for sure how it produces an output from a given input. The results look plausible, with great predictive power, but the workings in the ‘black box’ are not explainable or understandable. There might be all kinds of calculations we would consider unethical lurking in there - no one knows.

The last point makes clear why AI explainability has become a big issue, especially for methods like Deep Learning, which are powerful, but also opaque. Many organizations, particularly in regulated sectors such as financial services, have a preference for proven but perhaps less-powerful methods - such as regression analysis and decision trees. Why? Because they are more straightforward, more understandable, and their outputs are explainable and can be searched for apparent biases.

So what do we do about it?

We cannot eliminate bias. Humans create algorithms, and humans are biased, even at the level of choosing which data categories or features we ask an algorithm to interrogate when looking at a problem. We can only expose, measure and try to reduce bias.

My Sheffield University colleague Prof Noel Sharkey - a passionate investigator of the ethics of using robots and AI, and a popularizer of robotics, not least through being head judge on BBC’s Robot Wars - is clear: he thinks the problem is so big that all use of algorithms for life-changing decisions should stop until we can put in place testing as rigorous as that to which new pharmaceutical products are subjected

I may not go that far, as many uses of AI are not life-critical. But I do think we need to work very carefully to make sure use of algorithms is as fair, accountable, transparent and ethical as it can be.

Here are three Data Science ‘commandments’ which will help us do that:

1. Know Thy Data

Visualize it. Get descriptive summaries of individual variables. Do a statistical analysis of multiple variables and the correlations between them. Play with the data and get a feel for it.

Understanding the data may lead to some kind of corrective action, such as filling in missing values, removing outliers or correcting errors. Select specific cases and work through the values. Do they make sense? Are there any protected attributes in the data, such as gender and ethnicity, that could be affected by biases?

I have found it helpful to generate a summary document of the data. This can also provide useful insights for the users regarding data quality and potential biases. Getting to know and understand your data will continue throughout the lifetime of the project.

2. Know Thy User

We need to understand the context in which the data is being generated and used. Where did it come from, and what will it be used for? What are the user’s needs and requirements? Do they operate in regulated business sectors?

The more you can discover about the user of your algorithms and their context, the more you can understand the data and spot bias. It helps to have regular meetings with users, review relevant company documents and talk to people involved in creating the data.

3. Know Thy Algorithm

We need to understand the algorithms, methods and processes we use to analyze and model data. We need to know how different modelling techniques work, their assumptions and limitations, and be scientific or principled in the way we approach the problem - seeking to produce reliable, unbiased and valid outcomes.

To minimize bias and over-learning effects, we need to know how the algorithm works and be able to explain it. Reporting results on an unseen dataset will ensure you have a better idea of how your model will perform in practice. Using methods from Explainable AI will help make the inner workings of algorithms more transparent.

And we need clear governance procedures for building algorithms. We need audit trails, and strict, clear ethical rules and governance frameworks to make sure they’re followed.

An ethical approach to AI

Fundamentally, robots and algorithms are not racist or biased. Still, their outputs can be because - no matter how powerful AI and machine learning algorithms are - they will always be a reflection of the data fed into them and the world around them.

And while there are technical aids that can help detect biases during training or from the data input to the algorithms, in the end it’s down to us. Individuals need to take personal responsibility too. And we should also consider the ethics. Just because we can do something with AI, does not automatically mean we should do it.

We also need to treat data with respect and understand that, when coupled with algorithms, can produce products and services with far-reaching social impacts. We need to improve algorithmic literacy to the point where everyone working with data and analytics understands inherent biases in different methods of analysis and modelling. And we need to be able to understand what is happening within our algorithms to the point that CEOs can explain and take personal responsibility for their outputs.

So how not to build a racist algorithm? Be aware that AI-driven tools and technologies can be biased. Take care to identify and mitigate potential biases from data ingestion right the way through to output use. Explore the inputs to and outputs from AI for possible biases. Make the inner workings of decision-making more transparent. And take responsibility for AI solutions.

Paul Clough, Head of Data Science, Peak Indicators (opens in new tab)

Paul Clough is Head of Data Science at Peak Indicators, responsible for developing data products and services. He is also Professor of Search and Analytics at the University of Sheffield.