Skip to main content

Why trustworthy global data demands human input

data
(Image credit: Shutterstock / carlos castilla)

People think data is neutral. It’s not. When we don’t include people in data, we get bad data. This can be disastrous as seen with the UK’s A-Level results algorithm scandal of 2020 or Twitter’s racist image cropping algorithm. But while exam results and image-cropping algorithms can be fixed relatively quickly, what happens if you’re monitoring something as vast and complex as global supply chains? How do you make sure the data you’re gathering from afar is trustworthy and unbiased?

What the A-Level and image cropping algorithms failed to do was ensure the data they used wasn’t biased. That is no easy feat as data and algorithms are inherently biased - they do not have emotions, they cannot weigh up the fact that something is unfair to another person. They simply look at numbers and identify patterns. This is even harder when you’re gathering data from a satellite and trying to process that data into unbiased, objective observations.

How satellite data collection works

Satellite data used to be incredibly expensive, but now anyone with an internet connection can access it. This has helped drive innovation in a variety of sectors, especially environmental protection. However, processing data from satellites into actionable insights is expensive and requires intelligent technology.

Let’s take a look, for example, at mapping forest coverage; satellites capture signals from the Earth and translate these into images. Sometimes, these signals cannot reach the satellite, blocking the imaging process. We’ll discuss image blockers below. First the data needs to get to the ground. There are two ways to move satellite data to the ground:

  • Through satellite relays
  • Through ground antennas
  • Which then transmit the data by radio to an earth-based antenna.

Once the data is on the ground, you need to determine what data is needed for the goal you have in mind. Satellites gather a huge array of information from temperature changes, CO2 levels and precipitation to forest fires and volcanic eruptions. To carry on with the forest coverage example, you only need data from the areas you are monitoring. Still, this will be far too big for one or even some persons, to look at and analyze. 

This is where algorithms come into play. Algorithms can calculate the probability that areas are deforested by comparing observations through time. Once an area is flagged as deforestation, it is more likely to be a deforestation event. As you can imagine, the more satellite data used, the more accurate the algorithm becomes.

More pixels don’t equal better data

A satellite image is composed of pixels, very much like the photos from your mobile phone. It’s tempting to think more pixels lead to better deforestation monitoring - but that’s simply not true. Focusing on high pixel numbers enabling to identify single trees almost always leads to data overload, when you are actually interested in landscape level changes. High pixel counts are not necessary for deforestation monitoring, which typically refers to areas of tree cover loss larger than 0.5 or 1 hectare.

The cost of using high to very-high resolution (=high pixel count) satellite images is incredibly high while, as we’ve stated above, not practical for most use cases. Another issue around focusing on image resolution alone is that images only tell one part of the story. That is made even harder when those images are obscured, for example by clouds.

The problem with clouds

When you’re monitoring global supply chain risks, clouds have a huge impact on data availability - if your monitoring system only relies solely on optical images. Here’s an example of a satellite gathering optical data over a month.

For global, consistent monitoring, it is important to integrate multiple satellite sensors into your system. The ESAs Sentinel-1, for example, provides C-band synthetic aperture radar images. Synthetic aperture radar (or SAR) means it can acquire data regardless of weather conditions like cloud cover. For instance, forest areas have a different radar signature compared to non-forest areas - this enables aforementioned algorithms to detect whether deforestation has occurred. Radar and other sensors, like Lidar, are therefore imperative additions to a satellite monitoring system to ensure a consistent stream of data from above.

Why human input matters

Now we come to the crux of the challenge of a reliable monitoring system by gathering only satellite data. Satellite images can only ever tell one side of a story - it may look like an area has been deforested to a satellite, whereas actually, it may have been something completely different such as replanting of trees, or crops. The best way to verify potential false detections is to ask people on the ground. That might sound obvious but isn’t always done. This is especially important for sectors like (smallholder) agriculture or extractive industries, which are hard to monitor purely by tech due to heterogeneous, small scale land use practices. 

There are a variety of issues on the ground affecting farmers, suppliers and manufacturers which satellites cannot detect. For example, the global cocoa supply chain suffers crop destruction because of historic issues with pests and disease. Satellites cannot identify the source of those issues but people on the ground can help pinpoint pests or diseases. But then, how do you know if you can trust that human input? The key in connecting machine gathered (satellite data) with human gathered data (ground data) is maintaining consistency and reliability. That is done by setting up an ingestion process that verifies the data before ground information enters the system.

Ulula, which provides a software and analytics platform, helps join the dots between satellite based and ground-based evidence by equipping companies with tools to monitor human rights risks. They use things like chatbots and text Q&As to ask workers in their native language what is going wrong. Those workers are guaranteed their anonymity to ensure they feel safe if revealing bad practices.

In terms of monitoring global supply chains, it’s imperative to make sure your data is transparent, reliable and consistent. Only by monitoring social as well as environmental risks can businesses understand where in their supply chain and which risks are most important. Then they can prioritize and schedule actions to ensure they continue to move towards 0 percent ESG risks in their chain like 0 percent deforestation.

Last year there was a lot of press coverage on “bad data” or “bad algorithms”, which can leave a bad taste in the general public’s mouth. However, we must counteract this by spotlighting algorithms - combined with human input - which make significant changes to our lives. For example, those used to monitor global supply chains to deliver deforestation free commodity production. After 2020’s year of bad data and algorithms, 2021 is the year to bring back trust.

Nanne Tolsma, Head of Client Relations,Satelligence