The awkward conversation about sensors and data that we need to hear

Sceptics may think excessive claims are being made for the Internet of Things (IoT), but for businesses facing everyday challenges, it continues to be a hugely significant area of interest and potential.

Yes, the IoT may have risen to the “peak of inflated expectations” on the Gartner hype cycle, but to judge from this year’s bustling Strata + Hadoop World (London) event, where the focus was on big data strategy and technology for business, there is an eagerness to hear about what works in the real world.

Significantly, there was also evidence of a desire to be told some less comfortable truths about technology.

The reality is that as big data projects become operational, questions of performance, scalability and integration become critical concerns, making it easy to overlook this simple fact: the IoT is principally about sensors and what they produce.

Sensors sometimes lie

Given this context, now is the right time to discuss the five aspects of sensors and sensor data that never get talked about, beginning with a simple truth – sensors sometimes lie.

Talk to the engineers who build sensor networks and you will rapidly discover that it cannot be assumed that everything recorded by these devices is accurate.

In the oil and gas industry, for example, the extreme temperature range that desert sensors endure means that they start to “drift” almost as soon as they are installed. Down-hole sensors cannot easily be swapped, because while the sensors are so cheap that they are almost free, the cost of the lost production required to replace them most certainly is not.

Yet just as we as individuals compensate for feelings of dizziness or imbalance, so we can mitigate the effects of drifting sensors by employing neural networks, based on the reading of multiple sensors. Different approaches will match the circumstances, but whichever it is, sensor data needs to be managed. We cannot just assume it is accurate.

Not the whole truth

Unfortunately, even when sensors are not lying, they may not tell us the whole truth, which is the second subject for consideration.

Sensors often sit behind control units that were designed to support remote monitoring and control of the whole device, not to store-and-forward high-frequency data from each individual sensor. Those control units often filter and summarise data, which is not necessarily a bad thing when there is little point in transmitting petabytes of repetitive data merely to confirm “situation as normal”.

However, because what is “noise” for one application may be vital for another, it is crucial to understand how and where sensor data has been filtered and summarised. Where possible we should avoid the premature summarising of data when it is collected.

It can take work to get a signal

The third thing to talk about is just what is required to extract a useful signal from all the data generated. There is already a wealth of examples where sensor data is changing industry – reducing bearing-related derailments for a US rail operator by a massive 75 per cent, or allowing a UK train company to know its trains will have technical failures, 36 hours before they do. We can even keep the lights on through predictions of failure in electricity distribution.

In any of these cases, time-series, text, path, graph and affinity analytics may be used to produce a predictive model that is relatively simple. It is worth remembering that the datasets underneath any model can sometimes take a lot of creating.

Scoring

Of course sensors do not usually measure the quantity of interest directly, which is the fourth unspoken topic. They infer from a model, just as sleep monitors infer from movement and pulse rather than calculating brain wave activity.

In IoT applications, model-building and model-scoring (to produce predictions) are usually very separate processes, with scoring taking place on a network of distributed edge servers or on the devices themselves.

However, not all models can be scored on the devices. They may have to optimise end-to-end systems, requiring non-local data or advanced computation, while building the model and tuning it necessitates access to historical data from many devices.

Sensor data can be useless on its own

Finally, it is worth admitting that on its own, sensor data is often pretty useless as a basis for action. It requires integration with other data from within an organisation to become really useful.

If, for example, an oil pressure sensor on a train temporarily exceeds a threshold, is that just a blip? To determine whether it is requires comparison with previous “signatures” that preceded failure in the past, using historical and operations data. Then, if the gearbox is shown as failing, a host of decisions have to be taken about how, where and when to conduct the repair, using network operations data, parts inventory data, HR records for qualified engineer-availability and so on.

To be truly useful, the oil pressure sensor data has to be integrated into a larger whole. This is the kind of requirement for integration that many organisations entering the IoT arena are overlooking. Failure to integrate sensor data with their other data fatally undermines their entire project.

As businesses look to use the IoT to optimise existing processes or create entirely data-driven new products and services, they should bear in mind these five points.

Ignoring them may well mean organisations fail to capitalise on the hugely exciting opportunities that are opening up for every enterprise in the digital era.

Martin Willcox, Director Big Data (International), Teradata

Image Source: Shutterstock / nevodka