Open source is the fastest way to innovate big data

ITProPortal is touring California's Silicon Valley, getting the latest from up-and-coming startups.

We spoke to Herb Cunitz, president of Hortonworks, about how Hadoop came to dominate the enterprise analytics space, and why it's still the leader in bringing the new age of big data into the enterprise.

This market is moving so quickly in terms of what's happening. And if you wind back Hadoop, it started as a way of having a scale-out architecture. It allowed you to take your storage out onto the web. And around 2011, the team at Yahoo! Decided that Hadoop was a great technology and realised that it had potential far outside the four walls of Yahoo!

So they decided to take it out to be a general data processing platform, and they founded Hortonworks. We take Apache Hadoop, which is an open source data architecture, and to turn it into an enterprise-class data platform, completely in the open.

Currently the largest Hadoop clusters out there are Yahoo!, with 32,000 nodes, and then eBay with 8,000. In Europe it's Spotify with 1,000 nodes. These are all Hadoop early adopters, and they all work with Hortonworks.

Most customers will already have their relational databases, their data centres and so on – the traditional transactional systems. But what we've seen since 2012 and 2013, is the rise of this other type of data, which is unstructured data. From cell phone logs to GPS data, 85 per cent of data being created every year is unstructured, not transactional data. Hadoop, in our view, fits well into this environment. It doesn't replace the old architecture, it augments it – and allows the existing tools you have to access that data too.

So if you want to do batch analytics, or interact with it – that is, make a query in seconds – and actually have real-time data coming in, that's where Hadoop finds its place in the business.

And everything is open source. That means we ship it in open source, we deliver it in open source. What some organisations saw was the opportunity to bolt on some proprietary innovations, like security or the ability to have multiple clusters every time I want a new kind of workload. But we strive to eliminate vendor lock in, because open source is the single fastest way to innovate.

No decision makers want to be locked in to proprietary software. The way we make money is through support. That's to help design environments, get help, get up to speed, and manage the end to end lifecycle of products.

Think of us like a software as a service platform. We operate fundamentally like a software company.