Google Compute Engine: The ultimate solution for Hadoop in the cloud?

The cloud and Hadoop are arguably two of the most talked about IT initiatives of the past couple of years. Both of these technologies have matured to the point where we are no longer talking about what they are, but instead talking about how to best take advantage of them. In the beginning, Apache Hadoop was just a software framework that could store massive volumes of data in a cost-effective manner. Now the uses around Hadoop are more for data analytics and processing, and how new insights and real-time computing can give businesses a significant competitive advantage.

In the world of cloud computing, businesses have started to trust cloud services more and more, as security issues have been addressed. With the maturation of both Hadoop and cloud services, enterprises have started to see value in combining the two — taking advantage of the cloud's scalability, speed and cost-effectiveness in order to really delve deep into big data analytics.

For those looking for a way to run Hadoop in the cloud, Google Compute Engine (GCE) is a great answer. Simply put, it's an infrastructure-as-a-service that lets businesses run their computing workloads on Linux virtual machines, hosted on the Google Cloud platform. Let's take a look at some of the main advantages of using Hadoop in the cloud on GCE.

Record-setting speed

The MapR distribution of Hadoop on Google Compute Engine set a world record for MinuteSort by sorting 15 billion 100-byte records in just 60 seconds. This means that businesses that use MapR with Google Compute Engine are not only using a fast Hadoop solution, but they can launch and adjust clusters quickly, resulting in reduced expenses and improved ROI.

Cost savings

These days, companies need to track enormous amounts of disparate types of data in order to make better business decisions. Managing all of this information using a relational database management system (RDBMS) can be extremely cost-prohibitive. In contrast, the cost-effectiveness and scalability of Hadoop make it an affordable alternative to these traditional database solutions.

In addition, companies can now choose to run Hadoop in the cloud to take advantage of the hardware resources already in place. Companies only pay for what they use in minute-level billing increments. Because of this flexibility, cloud computing is inherently a more cost-effective data storage option.

For example, as noted above, with the MinuteSort record achieved with MapR running on Google Compute Engine, the cost difference was $20.33 (£12.36) for the MapR deployment, versus $10,000,000 (£6.1 million) for the previous Hadoop record, which required the use of custom hardware.

Start analysis immediately

Another benefit of using Google Compute Engine is that it's fast and easy to start provisioning clusters. Users can get up and running quickly using intuitive systems, such as the browser-based Google Cloud Console Tool, which lets users manage resources through an easy-to-use graphical user interface.

Flexibility

Additionally, the Google Compute Engine offers businesses the flexibility to allocate resources as demand requires it. Businesses only need to provision for the amount of clusters that they need, and they can easily scale up or down, depending on their workloads.

Businesses that are serious about big data processing and analytics should consider the benefits of using Hadoop in the cloud. The speed, flexibility and cost-savings that Google Compute Engine offers with MapR make the technology one of the most exciting and powerful solutions for businesses to date.

Michele Nemschoff is vice president of corporate marketing at big data platform solutions firm MapR Technologies.