Newcomers swell the SQL-on-Hadoop ranks

This article was originally published on Technology.Info.
As part of our continuing strategy for growth, ITProPortal has joined forces with Technology.Info to help us bring you the very best coverage we possibly can.

It’s one of the hottest areas in

big data science

and the choice of tools for running SQL queries on

Hadoop platforms

is getting bigger all the time.

Steve Shine, CEO at business intelligence company Actian, is convinced his company is onto a winner. The company recently announced the Hadoop Edition of its analytic software, combining high-performance SQL with the company’s visual dataflow framework, running entirely natively in Hadoop.

According to Shine, this addresses a huge need among would-be Hadoop users, who have so far held back on investing in the open-source big data framework because of the relatively high costs of skilled MapReduce engineers. (For more on how the Hadoop market is developing, see our interview with Mike Olson of Cloudera which will be published shortly)

“SQL skills are more abundant and more affordable. Most companies with IT teams have them. So for them, it’s a chance to get value from big data with the skills they already have, rather than going out to market for a skillset that is short on supply and tends to get quickly snapped up by technology vendors and consultancy firms.”

But Actian is far from alone in spotting the market opportunity here. The company’s launch of its Hadoop Edition came just one week after IT giant HP upgraded the SQL-on-Hadoop functionality it introduced in late 2014 with its Vertica database. VMware/EMC spin-off Pivotal and MPP (massively parallel processing) database company InfiniDB have also introduced SQL-on-Hadoop products. And among the Hadoop distribution companies, Cloudera is pushing its Impala offering, while HortonWorks is arguably the most active contributor to the open-source Hive effort.

In fact, despite the recent proliferation of tools for running SQL queries against big data stores on Hadoop, Hive remains the most widely used query tool with Hadoop - and executives at Hortonworks claim that the company’s recent Stinger project has done much to improve its overall performance.

That said, there is still a glaring gap in the market for tools that can offer the full range of SQL functionality on Hadoop that data scientists can already achieve using traditional

relational databases

. In other words, there is still much work to do, according to Mike Gualtieri, an analyst with Forrester Research.

But levels of interest in SQL-on-Hadoop are high, he adds. While

Hadoop

is generally positioned as an environment in which unstructured data can be analysed, many companies have begun their Hadoop experiments with structured data.

“Hadoop can handle both,” he says. “That’s what’s so interesting about the platform. And, over time, most organisations will do both, but for now, I advise firms to start with structured data and then move onto unstructured.”

After all, he adds, plenty of organisations have vast treasure troves of structured data at their disposal, much of which goes unanalysed today. And as we’ll see in the next article, the

Internet of Things trend

seems set to fill those stores even further. Gualtieri believes that most companies only ever analyse around 12% of the data they hold, leaving the rest (which is potentially valuable), “on the cutting room floor.”

Often, that’s because the vast databases and data warehouses needed to collect - and use - the remaining 88% would be prohibitively expensive to buy and maintain. Hadoop, by contrast, provides a low-cost way to gather vast volumes of data from different data sources on commodity hardware. In other words, Hadoop presents an opportunity to bring it all together in one place - but in order to analyse structured data, most companies are still more comfortable using tools and approaches with which they are already familiar.

This is where Actian, Vertica, Pivotal and others could help - by supporting the queries that companies already run against their structured data, but doing it in a more scalable, less pricey environment. Or, as Shine puts it, “We’re making Hadoop more accessible to a wider range of companies - and, frankly, that’s long overdue. We’re making Hadoop industrial-strength, to tackle more of that analysis needs that customers have today.”

Topics