A closer look at startup Map-D: Going real-time with big data using GPUs

We all know that with enough expensive servers, big companies can crunch through massive amounts of data. In some cases, like trending search reports, dedicated computing resources can even make large-scale analysis happen in real-time.

Now, startup Map-D has harnessed the power of GPUs to allow the real-time analysis and visualisation of huge datasets with a much smaller hardware investment. Winner of this year's Emerging Company Summit at Nvidia's Global Tech Conference, Map-D wowed the judges and attendees (they got my vote) with a compelling demo that allows hundreds of simultaneous users to analyse tweets worldwide. Even as a canned demo it would have been cool, but the good news is that the system is live and public, so you can play with it yourself.

An in-memory database built around the GPU

Described in simplest terms, Map-D starts out as an in-memory, SQL-compatible database. Its genius comes in a radically new architecture that allows it to use both CPUs and GPUs, with high-performance GPU memory serving as a cache for the most frequently used data.

CPU memory is then used as a larger next-level cache. Map-D also uses a column organisation – allowing it to make more effective use of the memory it has than a traditional organisation by rows.

Map-D's distributed architecture even allows it to scale across multiple nodes for extremely large databases, as well as allowing the real-time insertion of new data. This real-time updating is likely one of the reasons that companies – including Facebook and PayPal – have expressed interest in evaluating Map-D's product for use in creating real-time analytic systems.

The tweet visualisation screenshot below links to the live demo (click on the image to run the actual demo), so you can experience some of the power and flexibility of Map-D for yourself.

Note that the tweets in the demo are from a historical dataset and not being updated in real-time.

High performance through integration

A big part of Map-D's amazing visualisation performance is its integration of database, analytics, and visualisation into a single package. Because all three applications are integrated, data can be left in memory – even on the GPU – as the data is queried, analytics are run, and the results are visualised. Traditional approaches using separate applications typically require moving the data between applications and often back and forth in and out of memory – which of course slows things down.

Next steps: A supercomputer in your pocket

Reaching into the future, Map-D also claims that its architecture is perfect for running on the increasingly powerful SoCs found in mobile devices. Right now it may be hard to imagine having enough data on your mobile device to need to run analytics on it. However, as memory continues to become more dense and less expensive, it is only a matter of time before our mobile devices have their own big data requirements – especially for processing-heavy mobile applications like medical diagnosis and image recognition. Instead of being tied to the cloud, someday those data-and-compute-intensive applications may truly be able to go mobile.