The Big Data A-Z: Part two

The world of Big Data is full of buzzwords and jargon, but do you know what these terms really mean and how they apply to your business?

In this four part A-Z series we are translating the most widely used terms so you can have the knowledge and understanding to be able to help you fully implement big data strategies effectively within your business. This guide will help you sort your RAM from your sheep.

In part one, we looked at A-F and revealed the not-so-big mysteries behind terms such as Big Data and Cloud Analytics. In this second installment, we’ll be shedding some light on some of the most used terms from G to L:

G is for: Geospatial Analytics

This is a type of analytics relating to analysing data that has a geospatial or geographic aspect. Accounting for both time and space in analytics it allows patterns and trends to be recognised in a geographic context as well as analytical models that react to changes in spatial conditions.

Say what? You can analyse data on a location-basis.

Did you know? GPS, the source of most location data, consists of 32 satellites orbiting at a height of 12,540 miles. Most GPS systems are also able to utilise Russia’s GLONASS system, which has an additional 24 satellites orbiting the globe.

H is for: Hadoop

Hadoop is an open-source framework written in Java for the distributed storage and processing of very large data sets across clusters of computers using off-the-shelf hardware. Hadoop is often used to describe the ecosystem of different modules and has almost become synonymous with the term “Big Data.”

Say what? It is the big data elephant that has become a buzzword over the last few years.

Did you know? Doug Cutting, one of the two creators of Hadoop, named it after his son’s stuffed toy elephant. Hadoop is based on a research paper Google released in late 2004 describing the MapReduce model they were using internally at the time for their search engine.

I is for: In-Memory

In-memory refers to using a computer’s random access memory (RAM) as opposed to its hard-disk drives or flash memory storage. RAM is magnitudes faster than a hard disk drive and therefore software that can run in-memory without having wait to load data from disk can run many times faster that software relying on disk.

Say what? Running in-memory means it’s incredibly fast, and when you are trying to analyse large data volumes, in-memory is the way to go.

Did you know? At a fundamental level, RAM is made up of bits of storage, each bit can be stored using two NOR logic gates wired together and referred to as a “set/ reset latch” or “flip-flop” circuit. This type of storage is referred to as volatile as the memory requires power to retain its contents.

Not to be confused with the other kind of ram - a male sheep.

J is for: Java

Java is a general purpose, high-level programming language developed by Sun Microsystems (now owned by Oracle). It was designed from the ground- up to be a pure object-oriented language (everything is an object), with syntax similar to C++ but with some of the difficult aspects from C such as memory allocation taken care of. It was also designed to be platform independent, compiling ‘byte code’ instructions for the Java Virtual Machine (JVM) that runs anywhere.

Say what? Java: write once, run anywhere.

Did you know? Java was originally called Oak but later changed to Java (from Java coffee).

K is for: Key-Value Store

A key-value store is a data storage model designed for storing, retrieving, and managing associative arrays - otherwise known as a dictionaries or hash tables. Being far simpler than a relational database (RDBMS) it can be extremely fast and scale well and also has the advantage of being schema-less but compared but an RDBMS key-value stores are lacking in functionality and queries such as joins are either very slow or not possible. Similar to key- value stores are document stores (where the value is a document).

Say what? As simple as it gets with data storage.

Did you know? Key-value stores have been core to Microsoft Windows since 3.1, the operating system stores all its configuration values in a large key-value store called the Windows Registry. Unix-based operating systems, by contrast, store system-wide configuration files in the file system under the /etc directory.

L is for: Logical Data Warehouse

A Logical Data Warehouse (LDW) solves the problems associated with consolidating critical data scattered across silos. It is an architectural layer that sits on top of the usual data warehouse stores (silos) of persisted data and provides several mechanisms for viewing data without relocating and transforming data ahead of view time. It has the advantage of providing data that is fresher and not limited to the format used in the traditional data warehouse although it does require the resources of a powerful analytic engine.

Say what? A logical layer that allows you to view data in your data warehouse and elsewhere across your business without moving and transforming data ahead of view time.

Did you know? Gartner invented the term in 2011 when considering technical architectures of the future that were designed to incorporate and organise all the data held within an organisation.

Next week, we’ll cover M-R in part three. Stay tuned.

Sean Jackson, CMO, EXASOL