Last month, the 1991 action film Point Break celebrated its 26th anniversary. Actor Keanu Reeves says the film didn’t just change his life but countless others. “People [sic] started jumping out of planes because of Point Break. They [sic] started surfing because of Point Break.”
Something else was birthed 26 years ago, and it has transformed the world much more than a cult classic.
On 6 August 1991, Tim Berners-Lee posted a short summary of the World Wide Web project on the alt.hypertext newsgroup, inviting collaborators. He described an interconnected web of data, accessible to anyone from anywhere. The World Wide Web has since paved the way for generations of people that ‘just google it’.
People might have been inspired to take up surfing or skydiving because of Point Break, but with the Web and the wealth of information that it holds, adrenalin junkies can now plan their dare-devil deeds in minute detail, from tracking ever-changing weather patterns, pinpointing the best locations, signing-up to the latest training, joining hobbyist forums, watching niche sporting videos and more. The heart of the Web is information and our immediate access to this information is upending every aspect of our lives.
For instance, consider the time of year. It’s holiday season and as people trawl Airbnb and TripAdvisor I’ve been reflecting on the rigmarole of booking a summer getaway a mere 10-15 years ago. As a family, we used to traipse en masse to the local travel agents, wait in line, thumb through their many catalogues, make a snap decision based on a tiny picture of a hotel and a very brief description and then part with a lot of money without really knowing what to expect. These high street providers have since had to move into the digital age, and online sites with boundless crowdsourced information have sprang up in their wake.
If you torture the data long enough, it will confess
Data analytics has been around for aeons. It’s rooted in statistics – a discipline which stretches back to ancient Egyptian times and the periodic census for building pyramids. However, data analytics reached new heights with the advent of the World Wide Web.
The Web refers to space on the internet where information, such as web pages and documents, is stored and made accessible using “uniform resource locators” (or URLs) and which are interlinked using hypertext. In the first few years it was primarily used by science departments and physics laboratories. By January 1993 there were just 50 web servers located around the world; by October 1993 there were over 500.
Over time, adoption widened. In 1997, revered computer scientist Michael Lesk theorised that the existence of 12,000 petabytes of information is “perhaps not an unreasonable guess”. He also pointed out that even at this early point in its development, the web was increasing in size 10-fold each year and speculated that much of this data would never be seen by anyone and therefore yield no insight. His assumptions have been spectacularly debunked.
In the same year, Larry Page and Sergey Brin developed the Google search engine which processes and analyses big data in distributed computers. In 2010, Eric Schmidt, executive chairman of Google, told a conference that as much data was now being created every two days, as was created from the beginning of human civilisation until the year 2003. Today, the Indexed Web contains at least 4.58 billion pages.
Between 200 and 2010, many open source software projects like Apache Hadoop and Apache Cassandra were created to take on the challenge of handling big data.
In 2004, Pentaho was born – a business intelligence company that is helping firms navigate and direct their machine learning data to bring predictive capabilities to life. Customers like NASDAQ started using Pentaho to monetise their data like never before.
In short, data is increasing at such an exponential rate, it is estimated that by 2024 the world's enterprise servers will process every year the digital equivalent of a stack of books that reaches more than 4.37 light-years … all the way to Alpha Centauri, our closest neighbouring star system in the Milky Way – and that’s probably a conservative forecast by now. At the same time, we’re perfecting our data analysis skills. We don’t need to torture data, as economist Ronald Coase puts it, but we are interrogating data within an inch of itself.
So, what is data analysis? It’s the process of inspecting, cleansing and modelling data with the goal of discovering useful information, suggesting conclusions, and informing decision-making. With so much structured and unstructured data streams, multiple data types need to be blended at source, to deliver near real-time insight across all operations; only then can they be integrated to model scenarios and improve discernment. Geoffrey Moore (author and management consultant) captures the imperative well: “Without big data analytics, companies are blind and deaf, wandering out onto the Web like deer on a freeway.”
Turning data into insight and insight into opportunity
Many businesses are yet to fully master data analysis but pockets of excellence do exist and the potential for progress is immense. Take healthcare. By helping us to understand correlations between lifestyle, medical history and health provision, data can turn healthcare on its head. Instead of only treating people once they’re ill, we’ll be able to target at-risk individuals for prevention, so they’re less likely to become patients.
Data analysis is bearing fruit in industry, enabling firms to produce more and waste less. Through Pentaho, Caterpillar Marine Asset Intelligence demonstrated to one of its customers with a fleet of eight ships that shutting a tugboat’s engine down when idling for extended periods would save $2 million in wasted fuel every year. It’s a powerful return on investing in data analytics.
The Financial Industry Regulatory Authority, Inc. (FINRA), has been using a multi-petabyte data lake to automate and expedite the task of identifying trading violations. It now does this up to 100 times faster than before, has greater control over its data and far sharper teeth: FINRA ordered brokerages to return an estimated $96.2 million in funds they obtained through misconduct during 2015, nearly three times the 2014 total.
But with great opportunity comes great risk – as the spread of cybercrime will attest. Data is valuable and regularly stolen or corrupted for someone else’s nefarious gain. Organisations need to ensure that data is encrypted the moment it leaves the machine to go to a central data base, using at least SHA-256 hash encryption; in many sectors, including healthcare, data should also be fed into anonymised data sets, to ensure the privacy of each individual customer or patient.
What’s the next big data thing?
The World Wide Web has ushered in the information age, and with it Big Data. But what next? Innovation doesn’t stand still.
Machine Learning and AI are increasingly touted as extensions of the big data revolution. Yes, they’re still emerging technologies and adoption is nascent, but I predict incremental adoption in many industries – until we realise that AI is being used everywhere.
Two persistent barriers to adoption are the skills gap and ingrained working patterns. Many enterprises struggle to put predictive models to work, because data professionals operate in silos, while workflows from data preparation to updating models are creating bottlenecks.
But as we’ve seen over the past 26 years, barriers can be overcome. What was once a curiosity will become an embedded part of our lives. The power of data will refashion the way we live and work.
Steve Lewis, CTO, Hitachi Data Systems
Image Credit: Sergey Nivens / Shutterstock