Data analytics is in its infancy

null

You’re probably familiar with the maxim that data is the new oil; an untapped and valuable asset in the digital economy. An asset that promises huge rewards for those who tap into it. There is a partial truth in this: data is increasingly central to the smooth functioning of companies the world over.

But that said, data’s latent value hasn’t yet been truly unlocked and the reality is that data analytics is still very much in its infancy. Using the oil analogy, people understand how to use oil to generate value and also understand that there’s a fixed quantity available. But thus far, data doesn’t have either of these characteristics.

At Tutela we have a front row seat to the evolution of big data analytics and can see how much ground there is left to cover. Our systems crowdsource data from hundreds of millions of mobile phones across the world which involves collecting and processing over 10 billion new data points every 24 hours. Specifically, we collect diagnostic information on the performance of wireless networks so that they can be improved for mobile users. As the industry moves to advance data analytics beyond its infancy there are several less obvious challenges that need to be overcome.

1. Controversy of new findings

Of course, there have been early examples of data unlocking value but you might be surprised to discover there can be resistance to the findings revealed by true data analysis. For instance, when you have billions of rows of data, as opposed to millions, or when you implement more sophisticated data modelling techniques, you see things that weren’t previously apparent. When these discoveries are revealed they can be controversial because they challenge previously held beliefs. In turn, this attracts increased scrutiny and even conflict. Galileo wasn’t very popular when he was the first to say the world was round.

As an example, at Tutela we recently carried out an analysis of mobile network performance in Latin America. This analysis was based on billions of data records collected in 90 days. The results revealed that contrary to what was previously thought, a certain mobile network operator was outperforming others. This was proved to be controversial among some of the staff at these operators.

2.       Skewed skills field

These findings upended the apple cart given that they undermined an assumption, based on the statistical analysis of a much smaller amount of data. This assumption was that another network operator was leading the charge when it came to performance. But this is precisely the point, with larger amounts of data you see more and not everyone will like it.

Misconceptions, which tend to suggest overly convenient results, are often derived from data due to a lack of skilled practitioners in the workforce.  It’s  of little use having mountains of data if you can’t interpret it correctly. Because data analytics is still in its infancy there is a corresponding lack of expertise required to manage and make sense of it. The skills do exist but they tend to get gobbled up by large data giants like Google who do all they can to attract this type of talent. As such the skills field is slightly skewed and organisations need to work hard to attract talent.

3.       Data outliers

We also need to be aware that when dealing with extremely large data sets, outliers and rare errors start showing up more frequently because even machines make mistakes when entering, and performing an analysis on, billions of rows of data. Edge cases with 0.001% occurrence showing up in greater numbers as the dataset grows. As such it takes a lot more work to clean the data and ensure accuracy and consistency.

The scaling infrastructure required to store and process billions of rows of data with often intensive data analysis techniques, clean the data of  rare errors, and allow queries to be run on that resulting data set is also expensive and complex. So much so that it generally requires a partnership with a hosting provider such as Amazon, Google, or Microsoft since the required amount of hardware is significant.

4.       Speed

Industry advancement requires paying customers, and there is a need to manage their expectations. When you’re churning through billions of rows of data maintaining system speed becomes difficult, yet the average user in our technology culture understandably expects immediate responses to queries. Everyone wants results within seconds – or less.

Creating real-time analysis on massive datasets can be painfully slow. Even seemingly simple graphs and charts need to churn through and take the average of hundreds of millions of results before they can render. However, to the human eye, a bar-chart made from 1 billion rows of data can look almost indistinguishable from one built from one thousand results. There is clearly more value and statistical relevance in the billion rows, so time needs to be spent educating the users, industry, and the population about data relevance and what quantity of data is sufficient to actually back up the desired insights.

At Tutela, we have also discovered the need to use new data processing methods which include ultra-high performance GPU-based data processing systems to cut-down the processing time and improve the user experience of fast data analysis.

Tremendous potential

For those who successfully address these challenges the results can be tremendous. Things become possible that were never even considered before and we all have a lot to gain from the advancement of this field.

For example, in the field of medicine this level of analysis can bring far-reaching new insights.  DNA and blood test data analysis across populations can reveal early diagnosis of particular diseases leading to efficacious and ultimately cost-saving treatment and prevention programmes.

Tying these and other seemingly disparate information sources together is saving lives. This illustrates how with lots of data it’s possible to identify trends that were not previously seen and turn this into a real-life advantage.

The futuristic society we often see romanticised in movies and books is going to be built off of data analytics like this.

Infrastructure revelations

In the telco area, we recently looked at the quality of data going through a grouping of cell towers in London. It became clear that some cell towers were congested and the problem was only going to get worse leading to even poorer service quality and more service issues. This is valuable information for network providers who can pinpoint precisely where infrastructure improvements are required and make changes that ensure they remain competitively sharp – and continue to keep you connected.

The insights mentioned above are only available when you have billions of rows of data to work with. A good analogy is if you think of data as pixels. If there aren’t enough pixels in an image the picture is unclear and blurred. With billions of pixels the picture becomes crystal clear.

New and bold

This level of analysis is true data analytics. It’s a new and bold area and those involved in it are modern-day explorers making new discoveries and pushing back the boundaries of what was previously known. But to reach this point there is a lot of ground to be covered. We may think we are already engaged in data analysis but the reality is most companies are engaged in producing statistics rather than true data analysis. It’s new territory and there is still a lot of exploring to be done. But the rewards for those who work out how to handle and analyse this data will be significant indeed.

Hunter Macdonald, CEO, Tutela
Image source: Shutterstock/alexskopje