The Big Data A-Z: Part one

Big Data, data analytics, in-memory, Hadoop, NoSQL. Who’s interested in a game of buzzword bingo?

All these terms relate to the world of big data and just as data has grown in volume, so too have the technical terms, which can easily be lost in translation. So what do all these terms actually mean? Do you really know what the difference between ETL, ELT and BLT? Isn’t one of them lunch?

For any form of data analytics or big data to work effectively it is important that everyone can speak or at least understand the language. After all, you’re unlikely to gain buy-in from senior managers if you scare them with a barrage of technical terms that they don’t understand fully. Speaking at cross-purposes has been a barrier to entry for many, resulting in organisations being unable to choose the right tool for their analytic needs.

So in a four-part series, we’ll be shedding some light on some of the most used terms, from analytics to zettabytes. In this first installment, here’s A-F to help quash those Big Data mysteries and blank face expressions:

A is for Analytic Database

A database management system that is optimised for business analytics applications and services. It is specifically designed to support business intelligence (BI) and analytic applications, typically as part of a data warehouse or data mart.

But what does that really mean? Transactional-based databases are for day-to-day operations, analytic databases support blue-sky, “what if?” thinking and allow you to unearth insights and information from your data volumes.

B is for Big Data

Everyone, and we mean everyone, has their own definition of “Big Data” - although any definition will usually include “the three Vs” of Volume, Variety and Velocity.

Say what? Big Data is data that can’t be easily analysed using traditional techniques because there’s a lot of it, it’s strange-looking and/or it’s coming at you too quickly.

Did you know? Just about every word in the English language beginning with “V” has been used to describe the “bigness” of Big Data (volume, variety, velocity, veracity, validity, value, variability, venue, vocabulary, vagueness…).

However, interestingly, Gartner recently published its 2015 Hype Cycle for Emerging Technologies report in which the term Big Data no longer makes an appearance, superseded now by “Internet of Things.”

C is for Cloud Analytics

With cloud analytics, the data analytics process is provided through a public or private cloud, typically through a Software as a Service (SaaS) model or alternatively by hosting a data warehouse (Platform as a Service - PaaS) in the cloud on which you can run your BI, analytic and reporting software.

Say what? Analyse your data on somebody else’s computers using a pay-as-you-go type model. Only pay for the analytics you actually do. Rent computing power and licenses as opposed to purchasing them.

Did you know? The first public cloud as we know it now was launched by Amazon in 2006. Its Elastic Compute cloud allowed companies and individuals to rent the amount of capacity according to their requirements. Now Cloud Analytics can be run on other cloud infrastructures too, such as Bigstep and Microsoft Azure.

D is for Data Visualisation

A term describing a way to help people understand data by using a visual context. Patterns, trends and correlations can be seen that might not be obvious when looking at the raw data.

Say what? A picture is worth a thousand words - it works for data too. Leading solutions that allow you to visualise data include Tableau, Yellowfin, SiSense and LogiAnalytics.

Did you know? The brain processes visual information 60,000 faster than text.

E is for ETL vs ELT

Extract, Transform and Load, or ETL, is the traditional way of loading data into a data warehouse, where the data is copied to a staging area, transformed into the correct format and loaded into the warehouse.

Extract, Load and Transform, or ELT, is a different methodology where instead of transforming the data before it’s written, it is transformed in place in the target system. This leverages the power of the target data engine or appliance and reduces load times.

Say what? It’s all about loading your data - do you do it old school and do all the work first, or do you load it fast and dirty using a powerful data analytics engine to do the grunt work when you need the data?

Did you know? ELT is not to be confused with BLT, which is a bacon, lettuce and tomato sandwich.

F is for Flexible Deployment

Flexible deployment refers to a new generation of software that allows complete flexibility on how and where it runs. It can run on your own servers, as an appliance (a combination of hardware and software delivered and preconfigured to work straight away) or in the cloud, with all the benefits of running in the cloud.

Say what? Use software in the way that fits you and your IT infrastructure best.

Did you know? Many companies have a hybrid model where some systems are deployed on-premise and others in the cloud.

Next week, we’ll cover G-L. Watch this space.

Sean Jackson, CMO, EXASOL