A recent Gartner report places big data at the peak of inflated expectations in a hype cycle of emerging technologies. While big data is experiencing the height of its buzz, cloud computing, on the other hand, appears to have slumped into the "trough of disillusionment," according to Gartner.
The graph strikingly plots how big data is still on a rise, as indicated by the report, while cloud computing has peaked and fallen in interest.
This places decision makers in the difficult position of either investing in big data and risking resources on an emerging trend, or ignoring the trend and potentially being left behind.
The example of cloud computing demonstrates that hype can, and often does, precede a paradigm shift in an industry. Time has shown that while betting on the cloud was risky, it ultimately gave a first-mover advantage to the risk-takers.
Big data and the hype cycle
Gartner has worked to place a number of emerging technological buzz phrases somewhere on the following graph, showing how far along the hype cycle they have already come before plateauing into productive implementation.
However, the nature of these hype cycles is that there are no guarantees and an emerging technology inherently lacks experts, prototypical examples and patterns to guide adoption and investment.
We can observe the outcome of this uncertainty by looking at the early adopters. On the one hand, a NewVantage Partner survey (focusing on financial companies) highlights that more companies plan to invest in big data – 68 per cent by 2013 growing to 88 per cent by 2016, and that investments are rising: 19 per cent of major companies in 2013 spent more than $10 million (£6.2 million) on big data, while that number is set to rise to 50 per cent by 2016.
This is a C-level driven trend with an increasing focus on balancing integration, growth of data, and speed of analytics – the famous three Vs of big data - variety, volume and velocity. The hope is to advance fact-driven decision-making as well as the speed with which they are made.
The worrying lack of goals in big data implementation
A quick search on Google Trends comparing the frequency of the search terms 'big data' and 'cloud computing' is telling.
On the other hand, a Gartner report this September, also confirming an increasing investment in big data, reveals an unsettling lack of goals and skills. This is unsurprising, given the novelty of the topic. The five most widespread challenges are how to extract value, define a strategy, obtain skill and capabilities, and integrate data sources, as well as questions surrounding infrastructure and architecture.
In brief, many companies invest in big data without fully understanding the desired outcomes or developing the technology and skills that are needed to achieve positive results. The skills are in demand and there's a limited likelihood of hiring technology experts with enough experience to fill the gaps.
You simply can't find a Hadoop expert with a decade of experience, since the technology has not been around for that long. This leads to a shortage and inflated prices for staff. For example, Hadoop-related positions command average salaries in the US of up to $156,000 (£96,900).
Additionally, costs for the accompanying infrastructure can be high and usually necessitates expensive Hadoop and NoSQL enterprise support services. These are sunk costs without an immediate or guaranteed long-term return.
A company can easily spend millions of dollars and years to set up a team and architecture to evaluate big data and develop strategies and products, only to find that no discernible value has been added. The trough of disillusionment is appropriately named.
This isn't to say that evaluating big data shouldn't be at the forefront of any data-driven company of any size. Big data is here to stay. The data deluge from social media, the Internet of things, and general reduced cost of collecting and storing data has to be managed and furthermore it poses an opportunity.
Companies who can extract value from these vast but information-poor data sources, as well as learning from them and correlating them with their business goals, will set themselves apart from their competition. They will have to experiment and explore their own and third-party data sources to unearth value. The ones doing this in a flexible, inexpensive manner, avoiding the non-value adding diversion of architecture and technologies, have an advantage.
Big data as a service: The welcome alternative
The good news is that there is an alternative to exposing yourself to the costly, uncertain strategy of buying into the hype with significant capital investment. In the last year the market has reacted to the shortage of talent and the need to focus on business and not technology. Big data as a service (BDaaS) offerings around Hadoop built by the few experienced experts working with Internet scale data have emerged to mitigate risk by offering big data infrastructure and technology on a pay as you go basis.
This significant shift outsources three of the five most widespread big data challenges and at the same time removes the capital expenditure exposure. Three core features of big data are suddenly unlocked for budgets of any size.
- Firstly, the service approach can be used to query large and varying data sources for ad hoc or regular analytics and insight with Apache Hive. It has a SQL-like (structured query language) interface to Hadoop, which is transparent to the end-user. Business users experienced with SQL, a very common skill, are able to leverage Hadoop and explore big data instantly.
- Secondly, these queries combined with various data source connectors to existing database systems, and other Hadoop technologies like Apache Pig can accomplish regular ETL (extract, load, transform) data pipelines. This is often essential for further analysis or secondary systems and products, and usually integrates with or eases the load on expensive data warehouse systems.
- Thirdly, advanced interfaces to the underlying Hadoop system offer data scientists and engineers opportunities to utilise the on-demand cloud computing cluster technology. This can be used for highly-specialised tasks like machine learning, sentiment analysis, predictive analytics, or domain specific tasks.
The fundamental value of the big data as a service solution is that its capacity and cost scale with your demand, and that it doesn't require expertise to get started with. Practically, this means that if a business isn't using the service it isn't incurring costs. When it is using the service it can employ as many resources as needed. This can start at exploratory small clusters and small data sets with costs of a few dollars.
However, when required, the cloud-based underlying technology can support Internet scale data storage and processing with thousands of servers and petabytes of data. This new avenue empowers companies to start out in the field of big data without financial exposure or the need to acquire scarce skills and still have confidence to scale to production when the big data strategy and products have been established.
Ashish Thusoo is the CEO and co-founder of Qubole, a pioneering big data startup. Ashish is also the co-creator of Apache Hive and served as the project's founding vice president at the Apache Software Foundation. Before starting Qubole, Ashish ran the data infrastructure team at Facebook.