Multi-cloud and distributed computing – managing your data headache

null

Companies are committing their IT strategies to cloud. More and more IT budgets are moving over to cloud services and operational expenditure rather than capital expenses. At the same time, IT teams are being asked to step up and lead digital initiatives that can change the direction of their companies for the better.

According to Forrester’s 2019 Predictions, around 25 per cent of CIOs will make the move up to full leadership roles that bring together technology investment planning, data management and operational responsibility into a unified whole. The challenge associated with this is that all the issues addressed by data and digital transformation are intertwined with other problems across a company, from managing customer experience through to meeting sales and growth goals. For some CIOs, the chance to address these problems will be a natural next step. For others, this will be outside their – and their company’s – comfort zone.

Bridging these gaps will rely on data. However, even as companies commit more to cloud, they aren’t addressing the real problems that exist around data and the cloud.

From more cloud to multi-cloud

Companies are increasing their spend on cloud services to become more flexible and deliver greater growth. IDC has raised its prediction for spending on cloud infrastructure worldwide to $65.2 billion in 2018, to be followed by year-over-year growth of 37.2 per cent for the next few years. This increase in infrastructure spend covers compute, storage, data and application services.

For companies using cloud to support their new services, areas like compute and storage can scale up quickly. However, the database element is much more complex. Data from applications has to be stored, stored and used for analysis, and databases remain the most appropriate way to manage this data over time. These database implementations are now moving to the cloud as well – according to Market Research Future, the global cloud database market is expected to reach $21.66 billion annually, and should grow at 46.78 per cent to 2023.

This research estimate covers a range of different cloud database deployment options, from fully managed services through to cloud database platform purchases. However, this range of options does not necessarily deliver the level of autonomy that many companies are looking for. It is harder to run databases across multiple services or in hybrid deployments. 

Many companies and software development teams are adopting container-based technologies to provide them with some degree of autonomy for their applications that are built to run in the cloud. Containers can be run on any compatible public cloud service or on internal cloud services, so these applications are not tied to any specific provider. However, most database services won’t support this same degree of independence. To understand why this is, it’s worth going into a little bit of database design theory.

 

When you shift your applications over to the cloud, you can decentralise those applications and run them in a distributed environment. Running in the cloud – either in a hybrid cloud, or in a multi-cloud model – will split the application across different sites that will then have to talk to each other and deal with any new data created. Manage the data over time involves choosing to handle consistency, availability and partition tolerance, or CAP for short.

Any distributed data store will have to keep and manage data over time. Depending on your application requirements, you can prioritise whether you want to optimise performance, consistency of data, or availability. Under CAP, you can pick the two elements that you will concentrate on optimising, and the third will be a lower priority. This therefore leads to looking at potential trade-offs around application performance, consistency and availability. However, it should be noted that areas like consistency are still delivered, but not at the level of performance that some applications may require.

For architects designing applications to run in the cloud, distributed computing approaches have developed further. Any service will have to take the same approach to CAP regardless of whether it is running on internal private cloud or on a public cloud service. The database itself will have to be compatible with and available on multiple cloud services and run in the same way across all of them. However, most public cloud database services are tied to the cloud provider, or can only run in a hybrid environment.

Running databases in the cloud will involve thinking through and addressing these concerns upfront. For example, tune-able consistency and eventual consistency are options to meet data consistency requirements when application throughput and performance are more important. Conversely, for applications that require real-time data consistency and specific transaction orders, performance may be less of a requirement. Equally, are you willing to trade multi-cloud support for data autonomy?

Deploying applications in containers can offer more flexibility and independence for companies, but this has to be done in concert with a strategy for data management as well. Without looking at this approach in tandem, it will be difficult to take advantage of multi-cloud to its fullest potential.

Multi-cloud and data autonomy – the role for open source

For companies that want to retain control over their data, adopting multi-cloud is a key element. According to research by Gartner, an estimated 70 per cent of all enterprises want to follow this method. So how can companies achieve the same level of independence around their data as they can get for their applications using containers?

Over the past decade, open source databases have developed to meet some of the new requirements around running applications at scale. These new databases are proving popular for cloud application deployments. Sumo Logic’s report on modern application deployments shows that NoSQL databases are already more popular than traditional relational databases for cloud deployments. These databases are developed to store and handle huge volumes of data; each one has its own approach and qualities that can help developers meet their needs.

As part of looking at multi-cloud and NoSQL options, it is important to stress that any service chosen has to be capable of running in a fully distributed fashion where is no single lead node. Instead, all the nodes involved in a deployment should be able to carry out instructions and the cluster nodes then organise themselves to create full records on new transactions. This support for fully distributed computing is essential to make multi-cloud deployments work successfully; without this level of independence, the application will not be able to run across multiple cloud services and supply the levels of availability that are required. By keeping this level of independence, the application can carry on running even as services are migrated from one cloud service to another.

Of the potential offerings on the market, Apache Cassandra™ is currently the only option that can run in a real multi-cloud or hybrid cloud deployment. Cassandra has been developed to run across multiple locations and cloud services independently, and it can automatically distribute data across different data centres and geographies.

This ability to run across multiple locations - and to do so without needing code to be rewritten, or to be linked to a specific cloud provider - should help companies run their applications and take advantage of multi-cloud. Alongside this, however, it is also critical to look at support and performance optimisation. While open source offerings might be suitable for non-critical applications or testing, production deployments may require additional expertise, operational simplicity and support to scale up and meet the stringent demands on performance that today’s customers expect.

For enterprises that are looking at scale, expertise in design and operational improvements can be essential to build that framework for supporting these applications. Alongside this, looking at security best practices for implementations can be necessary, especially for new applications that process huge volumes of customer data. While the open source versions of these projects provide some of this functionality, looking at versions with enterprise-class support and service can fill the gaps for production applications. Together, these steps ensure that these new mission-critical applications can run in multi-cloud deployments, perform optimally and deliver great customer experiences.

Considering cloud and data together

Cloud computing continues to grow as more companies move some or all of their workloads over. What cloud computing offers is the ability to experiment, see success and scale. However, this ease of deployment should not lock that application into one approach over time. Instead, ownership over your data and how it is processed, managed and stored over time should be a critical consideration for architects.

To avoid this problem, distributed data support has to be designed into any new applications from the start. Using open source platforms like Cassandra, you can take advantage of multi-cloud without being beholden to any single cloud provider.

Evanna Kearins, Vice President, Global Field Marketing, DataStax
Image Credit: TZIDO SUN / Shutterstock