The Covid-19 pandemic has laid bare the importance of having a robust core infrastructure and operations. Within days of lockdown measures being announced, swathes of the employed population were working from home. Today, there are companies with thousands of employees who are externally logging into the same critical business apps. Meanwhile, consumers have become increasingly reliant on the uninterrupted delivery of internet services and technology to manage their daily lives.
This situation, while soberly acknowledging the human cost, has been both a boon and a bane. Many platforms and services are experiencing unprecedented levels of traffic. For some, it’s a windfall period of time with more business taking place on digital platforms than expected. Meanwhile, others are seeing a downturn, which requires difficult decision-making to minimise business impact.
Whether you are facing huge traffic congestion or going through a digital drought, you do not have time to ignore the bottlenecks created by improperly tuned databases, misconfigured systems, misunderstood infrastructure, and inefficient IT systems.
Understanding your responsibilities in the cloud
Cloud providers have launched services that have commoditised databases and firmly established Database-as-a-Service (DBaaS) as a common model. At the same time, multi-cloud is becoming the norm across many industries, supplying an approach that is intended to save money and mitigate risk. Technology leaders are seeking value and dependability in a climate that is challenging and complex to navigate.
Companies that embrace the cloud and its native technologies, often struggle with costs, downtime, and operational issues. The common issue is that the expectation of being “fully managed” turns out to be “mostly”, or only “operationally managed.” A cloud provider will take on the burden of running and managing a database infrastructure for you, but they are not responsible for how your database performs. The cloud provider can provide assurances of database service availability, but they cannot provide any assurance of database service performance level. Sadly, poor database performance creates a cascading effect which travels up the application stack and has a direct negative impact on the user and customer experience. This is one of the challenging characteristics of the “shared responsibility” model that catches many companies by surprise.
Businesses are given tools, but it is up to their people to use those tools correctly. There simply isn’t a substitute for experienced employees, such as database admins (DBAs), who can constantly assess how to optimise your systems and adapt to change. Some businesses try to absorb these functions into roles such as full stack engineers and site reliability engineers, but nothing truly replaces dedicated staff with focus and expertise.
Flattening a different curve
Businesses experiencing high traffic are tempted into scaling by credit card and moving their cloud infrastructure up the sizing tiers to keep pace with demand. In fact, this is often the cloud provider’s recommended approach. The focus here is on scaling without regard for tuning.
Using this approach, we’ve found surprising leaps in costs. In a recent survey, we asked how many stakeholders had needed to upgrade the size of their DBaaS by six times or more in the last two years. The result was 41 per cent. It’s important to remember that each instance jump in cloud infrastructure effectively doubles the cost of your compute resources. At that rate, you’re only five upgrades away from paying over 30 times more on your cloud bill.
It’s therefore important to recognise that DBaaS options provide a service, but don’t focus on optimisation or performance improvements. Your instance will run, but it might not run well. Tuning and optimising your cloud resources and the underlying infrastructure to reduce workload impact can have a dramatic effect. Tuning and auditing your DBaaS environments can conservatively increase performance by 25 per cent, increasing capacity, reducing bottlenecks, and often shrinking the size of the database itself by up to 10 per cent. In some cases, organisations have cut their cloud spend in half.
Backup methods are another source of cost reduction potential. Using RDS as an example, backups are generally limited to Elastic Block Store (EBS) snapshots. Taking back control of your backup strategy and leveraging the native MySQL backup facilities instead, can save thousands in cloud costs. These measures can enable companies to significantly delay pulling out the credit card for expensive cloud upgrades, and will reduce cloud bills.
Tune for spikes
Each application workload has its own unique fingerprint. Instead of linear growth in application and database usage, you are more likely to see patterned spikes in your workload. The reason behind a spike is not always easy to identify. As mentioned earlier, the first reaction for many businesses experiencing a spike appears to be scaling by credit card and stepwise upgrading their cloud infrastructure. As an alternate approach, getting in front of a spike, instead of reacting to it, can save tens or hundreds of thousands of pounds.
Staying ahead of problems and avoiding significant costs and slow performance requires solid processes, and adequate resources, to ensure proactive monitoring and competent tuning of database and application workloads. Seemingly benign database queries can severely damage a company’s ability to service its clients. At one company, we found a single database query that was being executed 25,000 times per page view. These queries were severely impacting the customer experience at the application level. With each query taking less than a second, they weren’t flagged as a problem. But, small things matter. In this case, they created a bottleneck that impacted the productivity of thousands of employees, and clients, for weeks.
Save by right-sizing
Elasticity is a key selling point of the cloud model, making it easy for an organisation to focus on core capabilities and growth. Elasticity is a characteristic which allows for dynamic increases and incremental decreases in capacity based on demand. The problem is that many businesses fail to scale-down in a timely manner. The speed with which you are able to right-size can mean a cost saving of up to a third on hosting.
This is achieved via a number of measures, such as regular and frequent system audits, and eliminating resources not in use. The quantity of data, and choices in storage class and location can be a significant way to slash costs, along with shrinking your cloud footprint by matching the size of instances to your workloads. As demand decreases, consolidation of workloads is also important, as separating workloads out by service type may not be as necessary. And again, proper tuning increases capacity by levelling performance spikes and decreasing average workload demand, thus opening the door for workload consolidation and resource elimination.
Now is the time to shop around and assess the cost of your current toolbox. Taking time to consider alternative services or components for your specific needs is a good way to tap into different pricing structures that could save you money. A careful toolkit audit can lead to significant savings, as any smaller services can be consolidated, unnecessary services can be discarded, and reductions in overprovisioning can quickly and easily lead to significant cost reductions.
The economic impact of the Covid-19 outbreak on business operations, in general, means that all companies face uncertainty. Business leaders need to be looking for new and innovative ways to overcome their unique challenges while remaining as cost-effective as possible. The answer is not spending more, but the right spending.
Brian Walters, Director, Solutions Engineering, Percona