Skip to main content

Dealing with more data - tuning and scaling your database for your application

digital business intelligence growth
(Image credit: Image Credit: Shutterstock/Sergey Nivens)

Companies create more data every day, and that data has to be stored and managed over time. Databases fill this role, organizing the data that our applications create so we can call on it later. However, as these databases continue to grow over time, performance will suffer. So how can we stop performance issues, and how much time and effort should we put in to achieve this?

There are multiple ways that we can approach this problem. It’s important to pick the right approach at the right time.

To scale or to optimize

Today, developers want to scale their applications as easily as possible. By using services like the cloud, this is simple - you can just choose a larger instance and pay more money. However, this ‘scale by credit card’ approach ignores some of the best ways to improve performance. You can do this by tuning your database instance, looking at your queries, and optimizing your current approach.

The first option involves looking at your database and seeing whether you can tune your existing implementation for better performance. By analyzing your performance level, you can carry out small changes to make a drastic improvement to your current setup. 

For example, the open-source database MySQL can be tuned to take advantage of the hardware or virtual machine that it is running on. Many database settings are heavily dependent on the workload type produced by the application

This includes looking at the buffer size (innodb_buffer_pool_size) and RAM that the physical or virtual machine has, and setting this at between 50 and 75 percent, as well as setting the log file size (innodb_log_file_size) to cover around an hour’s worth of transactions at between 128 megabytes and two gigabytes. You can also avoid doublewrite buffering on flush if you use local drives (innodb_flush_method).

Even these minor changes can improve your database performance significantly. There are other settings and ways you can change which can also improve performance, but if that is not enough, you can look at scaling by adding more resources. While this may involve getting out your credit card, taking the right approach here can cost less than you think. Scaling can be either vertical or horizontal. 

Scaling database reads and writes

Vertical scaling means adding more resources to a single machine so that the node can handle higher workloads, such as adding more RAM or more CPU cores, faster storage using drives with better I/O capacity or different storage technologies for different partitions. Vertical scaling is an example of how to achieve performance goals based on direct expense, as you grow your database by moving to bigger and more powerful machines.

Horizontal scaling is a different approach and is based on adding and removing nodes to a cluster. This is used by distributed databases to handle workloads that would be larger than a single machine. Horizontal scaling is problematic for many mainstream database management systems as they were initially designed to be used on single nodes rather than across multiple machines and clusters have to be maintained by having several full redundant copies of the same data stored across the environment.

As part of any decisions you make around scaling, it is also important to understand how each database handles read and write operations. Database reads and writes are different kinds of transactions, and you can take separate approaches in order to scale them appropriately. For example, you may want to scale your capacity for reads in order to keep up with temporary load, while your number of writes will stay the same. Rather than scaling the whole database, you can look at each side separately. 

Typically, write requests are not significantly scalable in a relational database as you still need to write data to each node in the cluster. The only way to scale writes horizontally is to use database shards, where data is spread across multiple database instances and there is a component that redirects queries to where the data for that query is stored.

Sharding can be implemented in several ways. Specific database management systems can support sharding natively, or alternatively, third-party extensions may provide functionality to allow data to be sharded and managed. A good example of a database with native sharding is MongoDB. Another alternative is to build sharding into your application by separating portions of data and using different endpoints/connections. Each of these approaches make it possible to scale up the amount of data that you write.

Alongside sharding for data writes, you can also think about data reads. These can be handled with different caching techniques, or by increasing the number of replicas of data that you create and using read/write splitting to separate transactions across server instances.

While it is possible to automate the scaling process in response to a specific situation, it is better to scale before it is needed. The performance impact on the application from scaling at the wrong time can exacerbate the problem, so it pays off - in all senses of that word - to be prepared in advance.

Thinking about the whole approach to data

Scaling data is not as easy as it sounds. For those unfamiliar with database management and administration, simply adding more power seems like an acceptable response, but this leads to wasted cost and more inefficiency over time. Learning more about databases can make it easier to plan ahead and improve performance as a whole.

To make the most of your application data, looking at optimization, automation and scaling as part of a bigger approach can help keep you ahead of your goals. While you may not want to dive into managing your database instances yourself, you can understand how to automate things in the right way and get the best advice on how things should be carried out in practice – ultimately enhancing your approach to data and improving operations.

Mykola Marzhan, Director of Server Engineering, Percona

Mykola Marzhan is Director of Server Engineering at Percona. He has been reducing time-to-market in software companies for more than ten years. Most of his career he has been focused on the development of monitoring, update and deployment systems.