For many companies, the current pressures caused by Covid-19 have exacerbated the challenges that they have faced around software development. Digital transformation went from being a growth strategy to one of survival. Almost overnight, online commerce exploded to levels only seen during holiday seasons.
This will be a forcing function for even more change, as companies either struggle to adopt new technologies or want to keep ahead of the competition. New approaches based on cloud-native IT will help.
More agile, more data … more problems?
To make the moves required, developers are looking at how they can make use of cloud-native methodologies. However, this is more than simply migrating existing applications on to a cloud platform and adding more infrastructure. It takes new architectures built around software containers and how orchestration tools can play in automating your applications, from build to production. How to use APIs effectively between containers, and then how data is now affected by dynamic changes to your application infrastructure.
Kubernetes is now the preferred method to orchestrate containers and manage applications based on this approach. Kubernetes can handle setting up application workloads and ensure that they keep running and handle scale challenges. However, while Kubernetes can orchestrate applications, it does not deal with the problem of data management. All the information created by applications still has to be managed.
Traditionally, to succeed with databases like Apache Cassandra, users have had to understand the entire software stack from the operating system up.. They also had to make sure they were consistent and following strict run books for operations and deployment. This approach not only requires in-depth knowledge of how the database works, it also requires some manual interventions over time to handle scaling up. As we become more cloud native, this future is being challenged.
Making data as easy to orchestrate as applications
Managing cloud-native application data alongside Kubernetes requires some planning. One approach is to have a database instance per service that sits outside of the Kubernetes cluster. This takes your data infrastructure out of your control plane and creates extra work for those that now have to manage two environments. Not ideal.
The better approach is distributing data alongside application components from a physical perspective, but inside the same control plane. This ensures that each application service can read and write its data effectively, but that the organisation can manage that data and application as one. More importantly, this approach should be able to scale on multiple services or clouds just like any software container images do.
In order to run Kubernetes together with a database like Apache Cassandra, you will need to use a Cassandra Operator within your Kubernetes cluster. This allows Cassandra nodes to run inside of your existing Kubernetes cluster as a service. Operators provide an interface between Kubernetes and more complex processes like Cassandra to allow them to be managed together. Starting and stopping a Cassandra cluster, scaling it and dealing with failures are handled via the Kubernetes Operator in a way that Cassandra understands.
Participating gracefully in a Kubernetes environment means providing insight into the cluster state. In practice, this means that some operations that were previously database internals, such as automated retries, or establishing Gossip links to track internal cluster state, are raised up to the API layer. Kubernetes can then make decisions based on the health of the whole cluster, so that any actions can be taken - for example, if more nodes are needed, then these elements can be launched in order to automatically take up the slack. All of it observable through available metrics.
Thinking Stateful around data
Typically, container instances inside Kubernetes are stateless - they are created as they are needed and then removed, rather than being stored over time. Storage needs are considered ephemeral. However, data management is different. For a database like Cassandra, nodes will need to persist data, and therefore have to be considered stateful services. These therefore have to be added by using PersistentVolumes and StatefulSets to guarantee that data volumes are attached to the same running nodes between any restart event.
This use of automation based on Kubernetes can make life easier for developers and operators. Existing services can be made more efficient and upgraded more easily, while new services can be added to meet customer demand. Alongside running Kubernetes and databases together, you can also consider how these can provide a Database as a Service or DBaaS function for internal developers.
For teams that aren’t yet familiar with setting up and running Kubernetes or would rather not invest the time, DBaaS options that use these technologies together can provide this on demand from the cloud. DBaaS can take away some of the management overhead and make it easier to focus on how to work with data, rather than how to manage the database instances manually.
Supporting your approach to data across your business
Moving to cloud native applications and data is essential for companies that want to implement faster and deliver what customers want. A critical element of digital transformation. From a developer perspective, linking the ‘big picture’ approach with the methods required to keep systems running can be challenging, particularly when scaling out databases requires some experience. Previous processes and organisational silos can be major problems that get in the way of these changes. Transforming into a data-driven operation requires less barriers.
For any team looking at how to support their company today, the pressure is on to keep up with customer demand and deliver services more efficiently. The adoption of microservices has definitely helped in this process, as it makes it easier to break up applications and improve them quickly compared to older monolith applications. However, the increasing complexity that can come with this approach can make it harder to scale out your services and support your data.
In order to make this process easier, looking at distributed database designs like Apache Cassandra as part of cloud native applications with Kubernetes can help. At the same time, the growth of more Database as a Service options around Cassandra has made it easier to adopt and run distributed database designs too.
Patrick McFadin, Vice President Developer Relations, DataStax