The Big Data A-Z: Part four

Big Data is the ultimate buzzword. But in the world of Big Data there are a whole host of terms and lingo that come with it and, just like the data itself, the lingo is continuing to grow in number.

Definitions of these terms can be murky and confusing but it is crucial to have a clear understanding of what it actually means when someone is talking about implementing web analytics or flexible deployment if you ever hope to reap the full rewards of what Big Data has to offer.

For any form of data analytics or big data to work effectively it is important that everyone can speak, or at least understand, the language in order to gain support from management and across the business. Most importantly, it will give you the knowledge to create a clear data strategy and allow you to implement the right analytical tool for your business.

In this fourth and final installment of our A-Z series, we’re looking at R-Z and will be revealing why X marks the spot and why YARN isn’t just a ball of string that cats like to play with:

S is for: Startup Program


Programs specifically designed to help young startup companies. For example EXASOL’s startup program is specifically designed to help young, innovative companies in the area of big data and analytics.

Say what? Startups can now use the fastest in-memory analytic database in the cloud for an attractive price.

Did you know? The use of the term start-up became popular during the dot-com bubble in 1997-2000.

T is for: TPC-H Benchmark

A transaction processing and database benchmark specific to decision support - i.e. analytics, run
and managed by the Transaction Processing Performance Council.

Say what? A benchmark. For analytics.

Did you know? EXASOL holds the number one position in the TPC-H benchmark for both raw performance and price- performance on data volumes ranging from 300GB through to 100TB.

U is for: UDFs

User Defined Functions, or UDF, define functions that perform specific tasks within a larger system. Often used in SQL databases, UDFs provide a mechanism for extending the functionality of the database server by adding a function that can be evaluated in SQL statements.

Say what? UDFs allow you to do what you can’t do with SQL.

Did you know? EXASOL supports UDFs written in four languages: Java, Lua, Python and R.

V is for: Virtual Machine


A virtual machine (VM) is an operating system or application environment that is installed on software which imitates dedicated hardware. The end user has the same experience on a virtual machine as they would have on dedicated hardware.

Say what? It looks like a computer, it acts like a computer, it has all the functionality of a computer, but it’s not a computer. It’s a virtual computer.

Did you know? A virtual machine is a fundamental part of many modern cloud infrastructures, the other component is called the Hypervisor, which is a piece of software that creates and runs the virtual machines.

W is for: Web Analytics

Web Analytics is the measurement, compilation, analysis and reporting of web data, such as analysing web server logs and monitoring website visitor behaviour (click analytics). It is used as a tool for business and market research as well as assessing and improving the effectiveness of a website.

Say what? Analyse how your websites are performing and how customers and visitors interact with them.

Did you know? A lot of web tracking use cookies, a small piece of data sent from a website and stored in a user’s web browser to remember state and session information for the website. They were first implemented in Mosaic Netscape in October 1994. Since May

2011 an EU Directive states that all websites give browsers the right to refuse the use of cookies.

X is for: X Marks The Spot

For people looking to turn data into immense value, the X in EXASOL marks the spot. Typically seen on a treasure map, X marks the spot of where the treasure is found. In EXASOL’s case, X represents where users can find a fast in-memory, column- oriented, relational database management system.

Say what? EXASOL - the fastest in-memory analytic database.

Did you know? Since 2008 EXASOL leads the well-respected international TPC-H benchmark for analytical scenarios across the categories 100 GB to 100 TB. The high-speed database company is acknowledged by Gartner in its “Magic Quadrant for Data Warehouse Database Management Systems” reports.

Y is for YARN

Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. It is a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users’ applications. It allows Hadoop to do more than just MapReduce data processing jobs.

Say what? Manage your Hadoop cluster better. With YARN, Hadoop becomes more than just a one-trick elephant.

Did you know? YARN wasn’t part of the first versions of Hadoop, it was developed separately as a way of extending the functionality of Hadoop. It was announced in 2011 but only finally became production ready with the official Hadoop v2.x release in October 2013.

Z is for: Zettabyte

A zettabyte is a unit of digital information that is 1,000,000,000,000,000,000,000 bytes or a trillion gigabytes. Each byte is made up of eight bits, each bit being a 1 or a 0.

Say what? A zettabyte is a huge amount of data, equivalent to roughly 36,000 years worth of HD TV content.

Did you know? It is estimated that annual global internet traffic will reach the one zettabyte mark by the end of 2015.

That concludes our A-Z guide, hopefully all that big data jargon is now easier to navigate. If you missed any of the previous installments check them out at the following links:

Sean Jackson, CMO, EXASOL