Big data is big business. According to Statisca, the big data market is expected to grow to an eye-watering $42bn in 2019. Pulling data can pull a number of interesting insights, but the proliferation of increasingly complex and large datasets has made it difficult for companies to effectively harness data for their own benefit.
Technological innovations in cloud-based data warehousing have addressed these challenges and provided a revolutionary transformation in data storage that is both efficient and cost-effective. Forrester estimates that 50 per cent of enterprises will adopt a cloud-first strategy to harness the power of big data, making it unsurprising that a growing number of companies are moving away from the traditional forms of data storage, in favour of a cloud-based set up. With all this change, there are still a great deal of unanswered questions that organisations require to see the full benefits of migrating to a cloud data warehouse. Here are the top questions you need to know.
1. How does a cloud-based data warehouse help organisations coping with big data?
Big data comes with some natural complications. It’s often not in a neat relational format, instead coming in semi-structured JSON, Avro, or Parquet files that are generated from IoT or web-based technologies. The last problem is more nuanced, but it’s very important: the more data you have, the more people will want to use it.
Traditional on-premises databases, along with many cloud-washed databases, are fundamentally unable to deal with these problems. When data grows, as it always does, their architecture forces the user to add additional nodes to the cluster to increase storage. This is both manual and costly. Many databases don’t have affordances for semi-structured data, either. Or if they do enable semi-structured data, it’s on a limited basis and it needs to be transformed into a relational format before it’s loaded. Last, and certainly not least, single cluster databases (which includes nearly every database available today) are hampered in their ability to serve multiple users at once, particularly when queries involve enormous amounts of data. That’s hard to avoid when you’re working with big data, as you might imagine.
A data warehouse that is built for the cloud has several advantages. It can use flexible cloud storage like Amazon S3 to flexibly expand to any amount of data, both structured and semi-structured. A data warehouse built for the cloud, also enables unlimited numbers of cloud compute clusters to serve every use case and concurrent users as the need arises. The cloud also can afford an enormous amount of pricing flexibility, allowing people to pay as they go. With staggering data volumes the new norm, that is not a trivial consideration.
2. How does a cloud-based data warehouse help end-users?
End-users want something very simple: the ability to answer questions. To do that, end-users need access to the data warehouse on their schedule, as well as compatibility with the data tools and query languages they know and love, such as SQL. A built-for-the-cloud data warehouse can help end users by enabling the flexible compute capacity needed to answer their questions when they think of them. But, it’s only truly useful when the end-users are able to combine that with the SQL and BI tools that they are already familiar with.
3. What are the financial benefits of using a cloud-based data warehouse?
While “being in the cloud” doesn’t instantaneously result in financial benefits, a data warehouse engineered for the cloud can yield some significant advantages. It can adapt to usage on a per-second basis, flexibly matching the exact usage pattern of the organisation with the “right” amount of capacity. This approach avoids the massive “overbuy” that many organisations struggle with when they resize their data warehouse but must license an oversized instance to prepare for data growth in the future.
Additionally, all of the compute consumption and storage costs can be billed on a pay as you go basis. Companies can therefore dynamically grow and contract their system throughout the year in accordance with changes to business requirements. This also negates frustrating maintenance fees and capacity planning exercises that traditional databases mandate.
4. How does a cloud-based data warehouse help agility?
Increasingly, the types of questions that need to be answered with data are coming at unplanned times, from a much broader set of people within the organisation. Everyone has a stake in the data that’s stored within the corporate database, and they all want access to it. The most powerful component of any built-for-the-cloud data warehouse is the ability to marshall unlimited amounts of resources at any time. By enabling unlimited compute clusters for different use cases, along with flexible on-demand scalability, anyone with a question can get an answer at any time. Traditional databases, whether in the cloud or not, cannot be expanded or contracted easily, and are fundamentally incapable of adding clusters to serve broader organisational demands for data.
Another important component of a built-for-the-cloud data warehouse is the lack of maintenance. Indexing, tuning, tweaking, management and complexity all drain an enormous amount of time from the database team. Any technology that truly leverages the cloud will vastly reduce these agility sapping problems, and enable the data team to focus on delivering data and analytics as quickly as possible.
5. How does a cloud-based data warehouse help disaster recovery and business continuity?
The cloud can make a significant impact for organisations who need true business continuity and disaster recovery (BCDR). But, it’s important to note that not all data warehouses in the cloud can leverage the benefits that are built into cloud IaaS vendor products.
The first cloud innovation that helps with BCDR is the Availability Zone (AZ). A built-for-the-cloud data warehouse would be automatically configured within a specific region across multiple availability zones. Any failure in a single AZ would immediately and transparently switch the service and operations of the cloud data warehouse to the other AZs.
The second innovation is the region itself. For example, AWS has dozens of available regions worldwide. With replication across those regions, a cloud data warehouse would be able to failover between regions, enabling an even more hardened solution for BCDR. Again, many traditional single cluster data warehouses cannot take advantage of these innovations because by definition they have to reside on a single cluster in a single VPC (Virtual Private Cloud).
The benefits of a cloud data warehouse are immense and far outweigh the traditional big data platforms that some organisations are still using. With growing technology innovations including 5G, smart cities, and edge computing, data is only set to explode. Deriving value and insights from this data will require organisations to make the shift to a cloud data warehouse sooner rather than later.
Thibaut Ceyrolle, VP of EMEA, Snowflake Computing
Image Credit: IT Pro Portal