Guesswork is often the enemy of those responsible for data centre design, operations, and optimisation. Unknown variables lead to speculation, which inhibits predictability and often compromises success.
In the world of storage, many mysteries still remain, unfortunately, with block sizes being one of the most prominent. While the concept of a block size is fairly simple, its impact on both storage performance and cost is profound. Yet, surprisingly, many enterprises lack the proper tools for measuring block sizes, let alone understanding them and using this information to optimise data centre design.
Let’s step through this topic in more detail to better understand what a block is and why it is so important to your storage and application environment.
What is block size?
Without diving deeper than necessary, a block is simply a chunk of data. In the context of storage I/O, it would be a unit in a data stream; a read or a write from a single I/O operation. Block size refers to the payload size of a single unit. We can blame a bit of this confusion on what a block is by a bit of overlap in industry nomenclature. Commonly used terms like blocks sizes, cluster sizes, pages, latency, etc. may be used in disparate conversations, but what is being referred to, how it is measured, and by whom may often vary. Within the context of discussing file systems, storage media characteristics, hypervisors, or operating systems, these terms are used interchangeably, but do not have universal meaning.
Most who are responsible for data centre design and operation know the term as an asterisk on a performance specification sheet of a storage system, or a configuration setting in a synthetic I/O generator. Performance specifications on a storage system are often the result of a synthetic test using the most favorable block size (often 4K or smaller) for an array to maximise the number of IOPS that an array can service. Synthetic I/O generators typically allow one to set this, but users often have no idea what the distribution of block sizes are across their workloads, or if it is even possibly to simulate that with synthetic I/O. The reality is that many applications draw a unique mix of block sizes at any given time, depending on the activity.
The difficulty with understanding the impact of block sizes always comes back to one key issue - the lack of ability to view them, and interpret their impact. This is quite surprising considering how many performance issues related to storage are ultimately tied to block sizes. Understanding such an important element of storage shouldn’t be so difficult.
Why does block size matter?
As mentioned prior, a block is how much storage payload is sent in a single unit. The physics of it become obvious when you think about the size of a 4KB payload versus a 256KB payload (or even a 512KB payload).
Throughput is the result of IOPS, and the block size for each I/O being sent or received. Since a 256KB block has 64 times the amount of data as a 4K block, size impacts throughput. In addition, the size and quantity of blocks impacts bandwidth on the fabric and the amount of processing required on the servers, network and storage environments. All of these items have a big impact on application performance.
This variability in performance is more prominent with Flash than traditional spinning disk, and thus should be carefully observed when procuring an All Flash Array or other device using solid-state storage. Reads are relatively easy for Flash, but the methods used for writing to NAND Flash can inhibit the same performance results from reads, especially with writes using large blocks. A very small number of writes using large blocks can trigger all sorts of activity on the Flash devices that obstructs the effective performance from behaving as it does with smaller block I/O. This volatility in performance is a surprise to just about everyone when they first see it.
Block size can impact storage performance regardless of the type of storage architecture used. Whether it is a traditional SAN infrastructure, or a distributed storage solution used in a hyper converged environment, the same factors and challenges remain. Storage systems may be optimised for different block sizes that may not necessarily align with your workloads. This could be the result of design assumptions of the storage system, or limits of their architecture. The abilities of storage solutions to cope with certain workload patterns vary greatly as well. The difference between a good storage system and a poor one often comes down to the abilities of it to handle large block I/O. Insight into this information should be a part of the procurement, design and operation of any environment.
The applications that generate blocks
What makes the topic of block sizes so interesting are the operating systems, the applications, and the workloads that generate them. The processes of the OS and the applications that are running in them often dictate the block sizes.
Unlike what many might think, there is often a wide mix of block sizes that are being used at any given time on a single VM, and it can change dramatically by the second. These changes have profound impact on the ability for the VM and the infrastructure it lives on to deliver the I/O in a timely manner. It’s not enough to know that perhaps 30 per cent of the blocks are 64KB in size. One must understand how they are distributed over time, and how latencies or other attributes of those blocks of various sizes relate to each other.
Traditional methods lack visibility
The traditional methods for viewing block sizes have been limited. They provide an incomplete picture of their impact – whether it be across the data centre, or against a single workload. Below is a breakdown of some common methods for measuring block sizes, and a description as to why they are lacking:
Kernel statistics courtesy of vscsiStats
This utility is a part of ESXi, and can be executed via the command line of an ESXi host. The utility provides a summary of block sizes for a given period of time, but suffers from a few significant problems.
- Not ideal for anything but a very short snippet of time, against a specific VMDK
- Cannot present data in real-time. It is essentially a post-processing tool
- Not intended to show data over time. vscsiStats will show a sum total of I/O metrics for a given period of time, but it’s of a single sample period. It has no way to track this over time. One must script this to create results for more than a single period of time
- No context. It treats that workload (actually, just the VMDK) in isolation. It is missing the context necessary to properly interpret
- No way to visually understand the data. This requires the use of other tools to help visualise the data
The result, especially at scale, is a very labour-intensive exercise that is an incomplete solution. It is extremely rare that an Administrator runs through this exercise on even a single VM to understand their I/O characteristics.
This would be a vendor-specific ‘value add’ feature that might present some simplified summary of data with regards to block sizes, but this too is an incomplete solution:
- Not VM aware. Since most intelligence is lost the moment storage I/O leaves a host HBA, a storage array would have no idea what block sizes were associated with a VM, or what order they were delivered in
- Measuring at the wrong place. The array is simply the wrong place to measure the impact of block sizes. Think about all of the queues storage traffic must go through before the writes are committed to the storage, and reads are fetched. (It also assumes no caching tiers outside of the storage system exist). The desire would be to measure at a location that takes all of this into consideration; the hypervisor. Incidentally, this is often why an array can show great performance but suffer in the observed latency of the VM. This speaks to the importance of measuring data at the correct location
- Unknown and possibly inconsistent method of measurement. Showing any block size information is not a storage array’s primary mission, and doesn’t necessarily provide the same method of measurement as where the I/O originates (the VM, and the host it lives on). Therefore, how it is measured, and how often, is generally of low importance, and not disclosed
- Dependent on the storage array. If different types of storage are used in an environment, this doesn’t provide adequate coverage for all of the workloads
The Hypervisor is an ideal control plane to analyse the data. It focuses on the results of the VMs without being dependent on nuances of in-guest metrics or a feature of a storage solution. It is inherently the ideal position in the data centre for proper, holistic understanding of your environment.
The absence of block size in data centre design exercises
The flaw with many design exercises is we assume we know what our assumptions are. Let’s consider typical inputs when it comes to storage design. This includes factors such as:
- Peak IOPS and throughput
- Read/Write ratios
- RAID penalties
- Perhaps some physical latencies of components, if we wanted to get fancy
Most who have designed or managed environments have gone through some variation of this exercise, followed by a little math to come up with the correct blend of disks, RAID levels, and fabric to support the desired performance. Known figures are used when they are available, and the others might be filled in with assumptions. But yet block sizes, and everything they impact, are nowhere to be found. Why? Lack of visibility, and understanding.
An infrastructure only exists because of the need to run services and applications on it. Let those applications and workloads help tell you what type of storage fits your environment best. Not the other way around.
Proper visual acuity and understanding of the distribution of block sizes across an environment pays dividends throughout the entire lifecycle of an environment. Understanding and accommodating for block sizes in the design, operation and optimisation phases of the VM lifecycle leads to more predictable application delivery in your environment, with possibly a more affordable price tag.
Pete Koehler, Technical Marketing Engineer at PernixData