The big data advantage, part two: ‘This is a very complicated case… you know, a lot of ins, a lot of outs’

Despite the growing popularity of big data, just four per cent of companies extract the full value of their information assets.

In my last article, in which I wrote in-depth of the “big data advantage,” I went over the potential impact of big data over the next few years and the types of functions and benefits organisations are hoping to find within the volumes of information they have acquired.

Despite the growing popularity of big data, as little as four per cent of companies extract the full value of their information assets, while an additional 43 per cent are still only obtaining “little tangible benefit from their information,” according to a recent reported released by PwC and Iron Mountain entitled “Seizing the information advantage.”

In fact, it seems that your more likely to fail at your big data project that not:

  • 75 per cent of IoT projects will take up to twice as long as planned through 2018 (Garner, March 2016).
  • 80 per cent of IoT implementations will squander transformational opportunities by focusing on narrow use cases and analytics. (Garner, March 2016).
  • Only 15 per cent report moving their big data projects into production in a recent survey (Gartner, October, 2016).

Only fifteen per cent — ouch! You see, although the Internet of Things (IoT) and big data analytics have paved the way for companies towards possibilities that were previously thought of as unimaginable, they also seem to have opened a real Pandora’s Box of complexity.

This current state of affairs leads me to another famous quote, again from the film The Big Lebowski. “This is a very complicated case . . . You know, a lot of ins, a lot of outs, a lot of what-have-yous.” 

So why exactly are so many enterprises still struggling with finding their own big data advantage? Well, just like with The Big Lebowski, the development and execution of big data contains a lot of moving targets and dynamic plot lines. Still, there are a few factors that are clearly major contributors to companies being slowed down in their pursuit of the big data advantage:

  • Complexity has been growing on all fronts: There are many varied and increasingly difficult analytics needs: from batch and streaming ingest to machine learning and graph analytics workloads. Plus, there is an ever-increasing volume, variety and velocity of data that organisations must learn to adapt to.
  • The analytics tools landscape is overpopulated and is changing at a rapid pace: Many of the tools that are currently being used exist as open source software. For example, how many of you had heard of Spark or Kafka as recently as two years ago? The fungal sprawl in analytics tools and environments is undoubtedly significant.
  • It’s extremely difficult for organisations to keep the right skill sets and infrastructure needed in order transform these massive and potentially valuable data collections into valuable and beneficial insights.
  • Time to insight and decisions still remain out of pace with actual business needs. For example, workloads might be running too slowly, you might be required to wait for resources or perhaps it’s the excessive data movement that is acting as a hurdle to gaining a big data advantage. There are many phases of and causes for slowdown, and they can exist practically anywhere in the analytics pipeline. IT teams have extensively made attempts to solve this challenge, typically relying on proprietary and expensive tools as the remedy, but have so far had limited success, and with added silos and additional complexity.

These factors combine to make data analysis a very complex process for organisations to undertake well. The datasets they have acquired are substantial and the data by itself is incredibly diverse, potentially existing in any conceivable format. The size of the records, the scope and the complexity of big data analytics has led to an explosive development of the technology.

This means that organisations that are already coming upon hurdles in the form of unbridled cluster propagation, the flood of new applications or the ever increasing necessity for more rapid intelligence, each presenting their own set of additional problems. Furthermore, technological developments taking place in the world of big data are in a constant state of evolution. Technologies such as Spark, Hadoop or Graph Databases have become ubiquitous in many industries and innovative approaches such as deep learning and machine learning are still on the rise.

So why do these struggles to gain big data advantage seem so intractable? It’s partly due the inadequacies of the approaches many organisations take when pursuing big data platforms. These approaches often include:

  • Home-grown (or Do-It-Yourself) solutions that, despite being able to provide organisations with complete control and customisation, can often take as long as a year to fully implement and will invariably lead to issues surrounding cluster sprawl and management headaches. Furthermore, the myth of the less expensive DIY solution is quickly shattered when a total cost of ownership is considered. (More on that in the future).
  • Big data appliances that enable fast time to value but are designed with single-use focus in mind and have only minimal flexibility for emergent technology. This can present seriously problems in a fast paced analytics ecosystem and bring up questions about long-term investment protection.
  • Cloud solutions that eliminate upfront capital costs but are often found to be more expensive after only a few years of being implemented – not to mention the resulting loss of data control companies using the cloud must accept. This means companies have to consider with forethought, and often re-evaluate, which analytics can safely and most efficiently be left in the cloud.

Despite the many challenges that companies seeking their big data advantage must face, the hurdles listed above can be overcome. Against this background we need solutions that make data mountains quickly understandable and that can be applied successfully in a highly-scalable environment. In addition, correspondingly large computing power is required, which conventional computing architectures usually cannot deliver.

And because many of these challenges result from today’s ever-increasing sophisticated and iterative analytics environment, at Cray, we believe an agile analytics environment is the best possible outcome and approach for organisations to take.

I’ll dive into that a little more in the next article and talk about key factors needed for successfully seizing your big data advantage.

Amy Hodler, Senior Analytics Product Marketing Manager, Cray

Image source: Shutterstock/Wright Studio