It's Hadoop, stupid: How one data platform is conquering the world

ITProPortal is touring California's Silicon Valley, getting the latest from up-and-coming startups.

We spoke to Ghislain Mazars, CEO of Ubeeko, about what's new in the world of Hadoop and enterprise big data analytics, and what you need to know going into the future.

The concept at the heart of Hadoop is the same as Bill Clinton's old campaign slogan: "It's the economy, stupid!"

Data is getting larger every year, as we get the explosion of connected devices, and sensor data from the Internet of things. Because of this, IT budgets are getting larger and larger as the enterprise starts to see the benefit of applying real, large-scale data analytics.

When Facebook opened its newest data centre, it revealed that it used no original equipment manufacturers (OEMs), that all of its servers were very cheap and came from Taiwan and so on. The new age is about using very cheap hardware and layering an open source software solution over the top to manage the huge pool of data. The idea is to have a lake of data, and to apply different processing to that lake.

It's really the end customers driving the growth of Hadoop – not the usual IT vendors. Using the power of Hadoop and MapReduce, we've reduced the time for doing these massive calculations from hours to minutes. A lot of companies have made huge investments in their legacy data centres and hardware, so today they're using Hadoop to complement their traditional IT to do analytic workloads and batch processing.

Hadoop has caused a huge shift in the way we architecture our IT projects. If you speak to any IT managers now, they'll tell you that they're just coming to the end of the big project they've been working on for five years: VMWare virtualisation. But now with Hadoop coming in ot mainstream enterprise customers, we release that VMWare was just one way of doing things. Yes, we can just pack as many applications as possible onto one server, or we can take a distributed model and spread them out across many servers. So it's been a big problem for Hadoop, to say to IT managers "forget about VMWare – that's not the only way of doing things."

The basics of Hadoop are about economics, and the way it's used today is more to do with data analytics. I'm sure the two will merge eventually.

Now with YARN and the advent of Hadoop 2.0, we've seen the nature of Hadoop really change – to the extent that we have to really redefine what it is. Now you have this new platform which is very hard to define.

If you're using Hadoop with MapR, or SAS, is that still Hadoop? What remains the same across all deployments is HDFS, a robust and scalable data store, plus YARN, which is used for cluster resource management and shared services.

Now you also have Spark, which is something of a "new kid on the block" and does in-memory processing and real-time data analysis. Sparks is a bit of a Swiss army knife. You can run MapReduce clusters on Sparks, but you can also do pretty much everything else too.

Hadoop, at its heart, is a platform. It's not in every enterprise yet, but it has all the characteristics to make it the number one enterprise data analytics app going into the future.

I think this is just the beginning for use cases of Hadoop. So what's next for Hadoop? Basically, it's about world domination. Hortonworks predicts that 80 per cent of the world's data will be processed through Hadoop by 2015. Hadoop is here, and it's not going away.