Data warehouses are at the core of many information driven business initiatives. They’re used to store large volumes of historical data which can then be used in business intelligence projects to create knowledge-based reporting.
But a data warehouse isn’t just a single, huge repository of data, they come in a number of different types. What they all have in common is that they’re separate from day-to-day operational systems. Data is uploaded to a warehouse after it has served its primary purpose, so that it can be used for statistical analysis and decision support, once out of date for this purpose it can then move on to be archived.
The history of data warehousing goes back to the 1980s when IBM researchers first came up with the concept of a ‘business data warehouse’ in an article on the architecture required for business information systems. Commercial software aimed at building data warehouses began to appear in the early 1990s and The Data Warehousing Institute (TDWI) to promote the use of the technology was founded in 1995.
Prior to the introduction of data warehousing, decision support systems often involved a large degree of duplication. Different departments of a business would have their own analytics systems using the same data but not have access to a central pool of information from which they could all draw.
How it works
A typical data warehousing setup will be made up of a number of elements or layers. It will be supplied with data by source systems throughout the organisation. The raw information will undergo some sort of preparation process to ensure that it’s properly formatted, accurate and ready for use.
To be effective the warehouse needs to have metadata, data quality, and governance processes in place. This allows IT staff to manage the information, the metadata, sometimes referred to as a data dictionary, also stores information about the data’s source, its format and how it’s used.
The data warehouse itself can use different architectures for storing information depending on the type of data and how it will be used. The most common are online analytical processing (OLAP), and online transaction processing (OLTP).
OLAP is used where there’s a low number of transactions, but the queries carried out are complex and may involve aggregating data. Typical uses for this are in data mining. OLTP on the other hand is used where there are a large number of short query transactions.
As big data becomes more common, predictive analysis is being used in many data warehousing environments. This uses complex mathematical models to find patterns in the data, these can then be used in the prediction of future outcomes.
A number of different models can be employed to structure within the warehouse. The logical model for example uses attributes such as dates, time, product IDs, etc as database keys. A physical model will combine this information with a traditional tabular database format. A conceptual model, by contrast, looks at the data’s relationships. In practice most warehouses will use a combination of these models o define and access their contents.
At the user end of the process tools and applications allow business units to access data from the warehouse for a variety of user tasks. Typically these will be things like market research, business intelligence and decision support. Though these tasks are not part of the core warehousing system, they can’t work without the warehouse’s ability to deliver the information via an up to date data dictionary.
Because the information held in a data warehouse is a copy of that used in live production systems it can be used and manipulated in ways that simply wouldn’t be possible with the original source data. This provides businesses with a number of key benefits.
Data warehousing makes it possible to integrate data from a number of different sources into one central database. This means it can be queried as a single source, so data from different parts of the enterprise can be brought together and eliminating the silos that can mean information which could be useful to one department remaining isolated in another. The net result is a centralised view of the entire business.
It also delivers the ability to maintain historical data after it’s expired or been archived from source transaction systems. This can be restructured so it makes more sense to business users without the risk of compromising the source. Restructuring can be used to improve query performance too, again without having an effect on production systems. A further advantage here is that because query data is in one place the job of data scientists in organising and ensuring the quality of the data is made easier.
Since a data warehouse is able to integrate information from many different systems it can prove useful for merged businesses where divisions may be running legacy IT. With a data warehouse this allows an overview of the organisation that might not otherwise be possible, which in turn leads to improved decision making.
Business systems like customer relationship management can benefit from data warehousing as the information can add value to the application without any need for it to be re-entered.
Data warehousing is now a well established technology and many businesses are reaping its benefits. But as ever in the world of IT nothing stands still. Increasing trends towards the cloud, evolving storage technologies and the use of big data are going to see changes to the warehouse model and may even see it superseded in some places.
The biggest challenge comes from big data technologies, in particular Hadoop. This uses a ‘data lake’ architecture which can offer savings in terms of hardware and software when compared to traditional data warehousing models.
We’re also likely to see increasing moves towards automation with less need for human intervention in managing and verifying the integrity of the information. This should make it possible to build data warehouses in real time, meaning that the data available for analysis is more up to date and will lead to more accurate decision making.
Given the existing investment many organisations have in data warehousing it’s unlikely to go away. There’s no doubt that it will change, processing in the cloud will unlock the potential to analyse large amounts of data faster and this will bring the benefits of data warehousing to smaller organisations.