Skip to main content

How bigger data is activating analytics

(Image credit: Image source: Shutterstock/wk1003mike)

It wasn’t so long ago that business analytics operated on a months-long cycle. For most of the twentieth century, the main interaction between a company and its data was a regular review of its most easily quantifiable measures, in the form of annual or quarterly financial assessments. Today, interacting with data this infrequently would be unimaginable in even a small business. As data availability and transfer speeds have grown at exponential rates, the time lag between intake and analysis of data has shortened to the point that, today, real-time data analytics is often part of an organisation’s standard operating procedure.

There are few industries which have not been lifted up by this rising tide of data. Access to and analysis of data is reorienting everything from customer service, to logistics and fulfilment, to product and software development – providing actionable insight well beyond the bottom-line of financial results. The new data reality has created significant employment opportunities for those with the necessary skills, with ‘data scientist’ now listed by Glassdoor as the sixth best job in the UK.

Paradoxically, real-time data analytics is not the fastest analytics available today. Where once a company could gain and leverage a competitive advantage by accelerating its reaction time in the face of current changing business conditions, in the 2020s we will see companies looking even farther ahead to find that edge. Predictive analytics, in which data is used to pre-empt a target audience’s preferences and behaviours, can inform and maximise the potential of marketing and product development, as well as foresee issues that may threaten revenue or create high expenditure. Using predictive planning to identify opportunity and minimise risk will join historical understanding and real-time awareness in businesses’ data toolkits.

While getting to that point will not happen overnight, there are a number of significant shifts happening in the world of data which, together, are making the promise of prediction increasingly attainable. This year, looking for these trends affecting your data operation will help you to future proof for a predictive world.

Location-agnostic data becoming the new normal

During the early days of big data in business, enterprise data centre deployments consumed significant time and capital as companies built out the infrastructure to manage much broader analysis. Some years ago, data usage reached a point where many organisations required multiple data centre deployments, distributing the workload and offering points of presence closer to branch offices and sites. Later, this strategy was itself overshadowed by the potential of hyperscale cloud providers such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure to provide greater capability with lower operating expenditure and little, if any, capital expenditure.

However, while these designs offer benefits in terms of cost management, they have also come to be one of the leading causes of application downtime, with vital services being only as resilient as the third-party infrastructure it runs on. The need for reliable, uniform performance is now leading companies to architect for multi-cloud deployments, ensuring that downtime on one provider is managed by seamlessly handing the workload over to others.

Ultimately, these more flexible, multi-dimensional designs will allow for the re-emergence of on-premises deployments. For analytics, this will mean that workloads involving sensitive or regulated data, where cloud deployment is challenging, will become more viable. It also allows for lower latency or highly volatile use cases where pushing and pulling data to and from the cloud traditionally limits timely data access. Retail locations, for instance, might employ on-premises analytics to predict demand by combining local sales and footfall data with global contextual data.

Data lakes drying up

Alongside flexible application architecture, there is a coming shift in data storage principles. The sheer size of the datasets involved in real-time analytics applications has meant that, for some time, using Apache Hadoop to distribute storage across large clusters of commodity hardware has been common practice. Now, as cloud computing increasingly becomes the default option for application deployment, we are starting to see S3 Object Storage come into focus as the new industry standard.

Importantly, S3 Object Storage is not the same thing as cloud data storage – though the two terms are often used synonymously. Indeed, offerings from on-premises storage vendors will become widely adopted in the interest of ensuring a seamless handover between physical and cloud architecture. This means that we will witness a drying up of those Hadoop data lakes – but also that faster object storage will be at hand for demanding analytical models and machine learning workloads.

Machine learning leaving the laboratory

Most large businesses now have machine learning projects, which apply artificial intelligence practices to deliver analytical insights at high speed with high accuracy. However, the initial wave of machine learning has tended to rely on specialised platforms that cannot draw upon all of the data relevant to business objectives. Using only a subset of data to train and score machine learning models imposes a low ceiling on potential accuracy. This has been compounded by data with insufficient accessibility to fuel broader applications – but with this issue melting away in favour of more flexible, unified architectures which avoid data lakes, end-to-end machine learning is being enabled.

Instead of drawing a subset of data into a machine learning system, organisations will bring the machine learning to the data to more accurately predict and influence outcomes. This will produce outcomes such as forecasting revenue based on personalised customer behaviour analytics, proactive fraud detection and prevention, and predictive maintenance on medical devices.

Sensor data surpassing business data – on every metric

As machine learning-powered analytics expands beyond specialised subsets of the total available data and embraces an end-to-end approach, the models will ingest a broad variety of data types, all of which are continuously growing in size and availability. Traditional business data, such as transactions, is already being joined by unstructured human data, such as virtual assistant interactions and person-to-person communications, as a rich source of potential insight.

Both of these sources, however, will soon be dwarfed by the exabyte scale of sensor data, as everything in the physical environment – including people themselves – is monitored in real time. This tsunami will redefine the very concept of ‘big data’. For any industry which produces or uses machinery, predictive maintenance will become an enormous opportunity, automatically anticipating risk of failure and future demand to minimise downtime and reduce operating expenses for physical infrastructure just as multi-cloud deployments have for digital infrastructure.

All of this, of course, relies on institutional capacity to collect, store, analyse, and act upon far greater quantities of data than was possible, or considered necessary, just a few years ago. Once, companies found a first mover advantage by being the first to adopt an enterprise data centre as part of their business processes. This decade, those who can build out seamless, rapid, end-to-end data operations will be the ones who out-compete the market.

Joy King, VP, Vertica Product Management, Product Marketing & Field Engagement, Micro Focus

Joy King is VP of Vertica Product Management and Product Marketing & Field Engagement at global software company Micro Focus.