Skip to main content

The fundamental issues with Amazon Redshift

(Image credit: Image Credit: Everything Possible / Shutterstock)

Back in 2013 when Amazon Redshift first launched, there were not many other options for cloud data warehouses on the market. When it came to parallel processing and columnar data with cloud infrastructure and self-service provisioning, Amazon Redshift made it easy for anyone to perform large-scale analytics in the cloud. However, Redshift has since become outdated, and unfortunately, cannot meet the needs of enterprises embracing modern analytics and is no longer providing customers with what they need.

Redshift was initially built on ParAccel Analytic Database (PADB), itself built on Postgres 8.0.2 which was released in 2005. Today, Redshift continues to run on what is now a 15 year old database. As a result, it is unable to provide the same benefits as newer, more advanced cloud data warehouses. It’s no surprise Amazon customers are beginning to take note and question the lack of upgrades to the system.

The first problem is the expanding role of analytics. Back in 2013, transactional databases were separate from analytical data warehouses – the former were used for customer-facing applications while the latter were used for internal reporting and data analysis. Using a database, individual transactions are logged and stored for future use, making it easy to replicate in the future. In contrast, data warehouses are ideal for managing large quantities of items, allowing users to manage supply chains effectively and make predictions based on empirical evidence. They stored data in different formats (row vs. columnar), had different architectures (replicated vs. distributed) and were managed by different database administrators.

Both structures have their advantages and when Amazon brought data warehouses into the cloud with Redshift (as well as moving databases into the cloud with Aurora & RDS), they were really onto something. Suddenly, huge volumes of data no longer needed to be sat on premises and on hardware, saving significant costs and processing power.

Building a cloud data warehouse

Fast forward seven years, and analytics is fast becoming a competitive differentiator within customer-facing applications. Analytics can no longer be restricted to separate data warehouses, leading to the need for databases with support for hybrid transactional/analytical processing (HTAP). With HTAP, customer-facing applications can enrich transactions with real-time analytics in order to provide a more insightful customer experience. Transactional data about the individual can be cross-referenced with similar records and the subsequent analysis will provide information about what similar people go on to buy, for example.

Now, a new generation of database services is not only bringing transactional and analytical databases to the cloud, but hybrid transactional/analytical databases as well. In effect, combining what once required two different database services (e.g., RDS/Aurora + Redshift). The result is not only more cost effective, but simpler and far more practical.

These competitors aren’t just offering new, business-changing services either. In addition to the latest database features, they’re always up to date with current bug fixes, security patches and improvements, provide the same world-class customer support on-premises customers receive, and use a cloud-native storage architecture truly separating compute from storage and with limited capacity; all features that Amazon falls short on with Redshift. Additionally, competitors are offering a higher quality service at a much lower price, too.

The second problem is building a cloud data warehouse using a third-party product. It is very difficult to fork or otherwise maintain a separate version of another company product, as Amazon has done with Redshift. As a result, Amazon has not been able to sustain the level of technological innovation expected of it and cannot take advantage of the latest Postgres features.

Consumers want consolidation

A similar issue for Amazon is supporting customers with technical bugs, as they didn’t create the code behind the product and don’t maintain the latest version of Postgres. A lot of improvements made to the system over the last 15 years will be missed, including bug fixes from the Postgres community that could potentially prevent critical downtime.

This kind of problem doesn’t exist when customers can go straight to a native DBaaS provider with issues via live chat and an engineer can fix the issue there and then, something that Amazon may well envy. Through software updates, bugs are systematically fixed and the likelihood of them occurring again are slim to none. Compare this to products like RedShift that may face recurring bugs for their entire existence.

The advantage of the cloud is that theoretically compute and storage are completely independent of each other, and storage is virtually unlimited, but if you want more storage with Redshift you will have to purchase more compute power. Since Snowflake pioneered the movement to separate compute from storage, it has become significantly cheaper to manage infinite object storage for providers. RedShift pre-dates this however and more storage equals more computing power; a vicious cycle to be caught up in.

Furthermore, the Amazon pricing structure is fundamentally frightening. A single product like Redshift alone cost more than a competing DBaaS provider’s suite of services. The number of services tailored to each part of data management can become staggering and very costly indeed.

What consumers really want is consolidation. They don’t want different vendors to provide different technologies for different processes, when it can all be done by one. Integration is key and the competition between vendors is only going to lead to higher quality, streamlined solutions. Technology isn’t built on the premise of specialisation any more. Just look at the smartphone; when was the last time you picked up a stopwatch, calculator or even a camera?

Enterprise customers are beginning to discover a world beyond Amazon, where new services are changing the experiences they can offer customers, at only a fraction of the price. Scalable solutions can suit the needs of any organisation’s size and the benefits of new industry technologies are reducing operational costs massively. The industry is shifting rapidly and Amazon must do more to keep up, before the rest of its customer base realises that the grass really is greener on the other side.

Shane Johnson, Senior Director of Product Marketing, MariaDB Corporation