What is big data?

Data is constantly referred to as "the new oil", while politicians compare tech giants to the US oil companies that rose to power over a century ago.

This "new oil" isn't being sucked up from the ground. Instead, it's being harvested in large volumes from people using online services, tools and applications.

There's so much data, in fact, that without the right tools to store and process it, organisations can struggle to make sense of it. This huge array of information is collectively termed 'big data'.

You only have to think of all the times you fill out an online form, sign up for a digital service, or complete a questionnaire to have an idea of the volumes being generated every day. Add to this the vast quantities of data generated by web-connected devices, social media and sensors all over the world, and you have an unimaginably large amount of information to contend with.

The growth of big data is incredibly valuable to businesses. If they can collect and store it properly, and analyse it effectively, they can extract valuable information and insights that can help them make important decisions.

Elements of big data

Before taking any steps towards implementing a big data analytics programme, it's important to know the fundamental principles that make it different to other data a company may traditionally find in its data stores.

Although there's some disagreement over what exactly constitutes big data, most experts agree on five core elements: volume, velocity, variety, veracity and value.

Volume: This is the key component of big data. Employees in the past generated the majority of data in organisations, but data is now mostly generated by systems, networks, on social media, and via IoT devices, with a massive amount of data that needs analysing.

Velocity: As such a huge range of information is coming from a number of different sources, it is extremely important the pace at which the data flows into an organisation. This data flow is huge and continuous and includes information like emails, text messages, social media posts which are all arriving at every minute of every day. Valuable business decisions should be made and based upon the real-time data that is available, which will need to be processed and analysed. In order to do this, highly available systems are required with failover capabilities to cope with the pipeline of data.

RELATED RESOURCE

IT Pro 20/20: Meet the companies leaving the office for good

The 15th issue of IT Pro 20/20 looks at the nature of operating a business in 2021

FREE DOWNLOAD

Variety: Data types and sources widely vary, and they come in two different forms; structured and unstructured. Structured data is information that normally comes from a database, so it's well-organised and clear. On the other hand, unstructured is data that comes from elsewhere including social media websites like Facebook or Twitter and is generally more chaotic as it includes other data formats like photos, videos, audio files and more. As there is quite a big variety of unstructured data, it may prove to be problematic for processing, analysing and storing. Tools that involve big data look to process this unstructured data to understand it and processing the chaotic part of it is a core component of big data.

Value: You might have a huge quantity of data to work with, but at the end of the day it won't matter unless you are smart with it in order to understand how much value it can add. When you add together the rest of the Vs, you need to ask yourself if any insights you collect from analysis be worthwhile for your business or organisation? If your data isn't used intelligently, it may unfortunately not end up providing a lot of value in the end.

Veracity: With so much data flowing in, considering its volume, variety and velocity, it can sometimes be challenging to evaluate the information's quality. The quality of the analysis stemming from this data is greatly influenced by this. When launching a big data project, it is wise to seek help to make sure the data is clean and certain processes are in place to prevent unwanted information from building up and affecting the quality of your analysis and therefore your results.

Bobby Hellard

Bobby Hellard is ITPro's Reviews Editor and has worked on CloudPro and ChannelPro since 2018. In his time at ITPro, Bobby has covered stories for all the major technology companies, such as Apple, Microsoft, Amazon and Facebook, and regularly attends industry-leading events such as AWS Re:Invent and Google Cloud Next.

Bobby mainly covers hardware reviews, but you will also recognise him as the face of many of our video reviews of laptops and smartphones.