Skip to main content

Why taking on Amazon without the right data engine is a fool’s errand

Flatfile data
(Image credit: Flatfile)

While you may have come across the term ‘data layer’ before, chances are you might have seen it in the rather constrained way Google and web developers talk about it. But for the CIO, a data layer is a much more abstract and powerful idea, though its importance isn’t as widely known in the enterprise as it really does now need to be. 

Let’s see why that needs to change. I need to start with a bit of a history lesson to give context (but that’s far from what I want to get over today, rest assured!): back in the good old days, we just had a database and an application. Over time, applications got more sophisticated, so you now have an application layer, which is where different parts of applications live; and by the same logic, some organizations now have a place where all sorts of data lives—the place where data is stored.

That might be a database, indeed maybe always is. But it might be something else, like an event stream, a log file or cloud block storage. In any case, from a business perspective, these days you want to present at that application layer a lot of things users find useful, but which look very different at the back end: a banking transaction, see your last statement, view the history of your transactions—there’s a very wide range of digital services you now want to offer. 

The problem of consistency

Most people by now are doing this through a microservices architecture, where each of those things is a separate microservice. The most well-known example of this is the front page of Amazon; you may or may not know, but it has an estimated 500 different microservices there giving you bits of information—one is giving you your next best offer, one of them is giving you your recent spend, and so on. 

The problem is your microservices, wonderful as they are, all have different requirements. Some of them must be consistent and accurate, like your bank transactions. Some of them have to be developed for information, so they may be analytical—what’s this customer’s average spend this month? Some of them may be unstructured data, i.e. ‘I want to see this month’s bill’. Some of them might be eventually consistent, like messages, chat boxes and things: finally, some of them may need to be able to find a single piece of information very quickly, e.g., search. 

Whatever they do, each of these microservices need to be able to do two things. They need to be able to respond, and they need to deliver content. That's what the application logic wants from the system, and to do that you need to have an appropriate data storage mechanism. And that's where the data layer comes in.

The Netflixs and the Amazons cottoned on to the power of microservices and what they could do for a digital business very early on—indeed, very rapidly. The enterprise space has been moving towards this since they realized how effective this way of abstracting data from multiple sources beyond relational database alone from the application and presentation layer could be.

Fact: no one data engine can satisfy all your new digital business needs

So, we've established that there is such a thing as a data layer. What's the problem, then? The problem is having the right database to do the right data layer work. For something like storing your statement, sure, you can use something like Amazon Simple Storage Service (Amazon S3); for your messages, you can use something like Cassandra, which is, to use the industry jargon, ‘eventually consistent’. That's fine. But the work that is actually the revenue generation component of your business, is transactions, which don’t really fit into this nice and neat data layer world, as they tend to reside still in older RDBMS monolithic databases.

That kind of database isn’t really suitable for the cloud-native world. They don't scale, they can't always be open—so while all the rest of your cloud-based, super-modern architecture and microservices are all on-point, the thing that actually puts money in the bank is locked up in a now archaic way of working with data. 

This is a critical failing, because websites do stop and they do go down, but now people expect your service to be always on, 24x7, wherever they are in the world. And if it doesn’t work, they are literally only ever one click away from doing whatever it is they want on a rival; if I can't pay by PayPal, I will instantly move to paying by something else, or I just abandon the purchase altogether. 

That’s business you haven’t just lost then; they may never come back. It’s not like you’re the only shop in the village now, after all. Just as bad is a database that doesn't scale when things get very busy, e.g., for your Black Friday campaign.

Distributed SQL: why you really need to know about it

The software industry knows this is a problem, so a kind of solution emerged: the NoSQL database movement, which is eventually consistent and works for most things online, apart from, er, slightly important things like paying or transferring money. Not just event, but transactional consistency is absolutely critical for this—you don't want your bank account to keep taking money from your account multiple times you pay for something or fail to balance your credits and debits.

Clearly, for each microservice it is important to have the right storage solution at the back so that each useful little service can deliver the right thing. And for anything transactional, the conditions are pretty strict; it has to be consistently properly and mission-critical level available, distributed, and secure, because it has to stay up in a way that all your other services can't. 

At this stage, the history lesson has reached its business point, and the reason why data layer matters so much to the corporate IT team: the reality is that the days when you could just have one database, an Oracle or a Microsoft SQL Server, that did everything are gone. And for the specific transactional microservices work you want to do on a global scale in the cloud, distributed SQL database has emerged as the best option for any CIO who wants to move to a Netflix or an Amazon level website architecture but hasn’t been able to do because of this monolithic and NoSQL database gap. 

In other words: if you want to offer a global web shop, you have to be able to manage global transactional data--and at global scale. Distributed SQL is not the only game in town for managing global data, but it is for managing global transactional data. And if you’re not interested in moving your core transactional systems to distributed SQL database, then I’m very much afraid to say your competitors are, and probably already have. As a result, they are very shortly going to have the availability and the uptime and the resilience to deliver a proper global business website, and if you’re not keeping up you are going to lose ground very quickly, and will really struggle to win it back.

The conclusions I think most IT leaders will draw here is that if they don’t have one yet, a dedicated data layer and a data layer team to drive it makes a lot of sense: and that if the business is serious about transactions in the cloud, at scale, looking at multiple data engines and finding the right one for each job is simply unavoidable. Yes, it was simpler in the good old days, but these days are much more exciting, if you’re properly equipped for them.

David Walker, Field CTO, EMEA, Yugabyte