Nursing critical IOPS and Throughput using Storage Spaces on Windows Server 2016

July 27, 2018

Nursing critical IOPS and Throughput using Storage Spaces on Windows Server 2016

Hiya, have you ever had to deal with storage constraints? If you're a data professional, chances are you deal with them every day. For us folks in cloud environments, those constraints are both published and concrete.

Azure Managed Disk "Premium Storage" Costs as of 2018-07-27

If you're slick with numbers, you'll see that this isn't a linear chart - P30 offers the most balance of IOPS and throughput while P20 offers a good alternative of high IOPS and throughput for the cost. However, when you look at the VM limits, they don't align with the disk limits.

Azure Virtual Machines "Memory Optimized" Limits as of 2018-07-27

I pay the most attention to the "Max uncached disk throughput" numbers, as they offer the highest throughput and present themselves as the most likely restriction relative to the per disk limits above. However, it's easy to notice on the higher VM SKUs that it's very easy for me to build a medium-size VM like the DS4v2 where I cannot create a volume that actually leverages the VM's potential.

For example, if I have a DS4v2 with 25K IOPS maximum and I have a single P30 disk, I'm only ever able to nurse out 5K IOPS and 200MBps, assuming all of the other stars align in my favor. For some folks, this is totally okay and within the bounds of their workload. For SQL Server and other data platform products, this can become quite the hindrance. Even for on-prem, if your host (physical or vm, makes no real difference here) is presented multiple disks, then you have a couple of things to consider:

Separate data and log workloads. This used to make a lot of sense back in the days of spinning disk; in 2018 with prolific flash storage and provisioned storage, the need to separate these objects is much less significant. YMMV, of course.
Leverage multiple data files and filegroups to organize your data across disks. Using multiple files and even filegroups has a ton of advantages for performance, recoverability, and control. However, more often than not, our performance challenged database was configured by a vendor or someone other than us, and we've got a very large amount of data already persisted and depended on. Not something we want to change willy-nilly.
Leverage a software-based solution to simulate a disk array. For Windows Server 2012 and above, we have Storage Spaces. For Linux, we have Logical Volume Manager (LVM).

Storage Spaces is a relatively new technology (not to be confused with Storage Spaces Direct, though similar) and can be thought of as the next evolution of Windows Dynamic Disks. {maybe someday i'll update this section - I don't know enough about how DD or SS works well enough to know if anything is shared!} Notably, Storage Spaces is far more configurable and supports a very large amount of disks. Below are some important terminology when thinking about a Storage Spaces configuration.

Columns: This is a count of disks applied per "stripe". This number is also a multiple once implemented. If you wanted to have 9 disks active, you could not have 2 or 4 columns. You would need to use 3 as that's the only divisible number.
Interleave: This is the amount of data (in bytes) stored per column. There's a ton of mathematics to be considered when selecting the optimal interleave. The default is 512 * 1024 or 524288.
Storage Pool: A logical grouping of physical disks to be used for Storage Spaces. You can have one or many pools, each pool can contain one or many disks, and each pool can have one of many virtual disks.
Physical Disk: A disk object presented from the storage controller(s) to Windows. These can be "virtual disks" presented from the Hypervisor (VMware, HyperV, Azure, AWS, etc.) as well.
Virtual Disk: A virtual disk object belonging to a Storage Pool that is comprised of one of more physical disk segments

For my environment, I need to be able to meet throughput requirements more than anything else, especially during backup routines. I run primarily "DS13v2" SKUs, akin DS4v2 listed above. In that, I have a high water mark of 25K IOPS and 384MBps. In order to achieve my throughput maximums, I leverage 2x Azure Premium Disk "P30" units, with an aggregate capability of 10K IOPS and 400MBps. Your environment and workload may differ and benefit from higher IOPS, but the logic remains the same. I use those 2 disks in an Azure Storage Spaces Virtual Disk to create one 2TB volume capable of servicing those requirements.

In order to standardize and automate my environments, I use Powershell to create these disks.

Note that the Server Manager utility cannot create a "striped" (see: columnar) Storage Spaces volume - these disks are only "spanned", in that every single byte of the first disk is used before the next disk sees its first IO. For performance, this is undesirable.

Server Manager can show you what exists but it cannot show you columns, the best feature of Storage Spaces!

Below, I've provided my script. I've parameterized all configuration items interesting to me. Internally, I actually push those parameters to the command line and simply call them at runtime rather than storing them in the script. For the sake of this submission, I've left them in the script for you to be able to see how I configure the script. To run this script, you need to know and specify the following:

A made-up number for the pool. I left one by default as I assume you don't have another to compete with. If you end up creating more than one, this script is built such that you'll need to change the number.
A made-up number for the virtual disk. Same as above, increment this number if you end up creating multiple virtual disks.
A desired drive letter.
The number of columns to use. You can either specify the total number of disks (common for a cloud scenario) or you can list the number of disks that could be evenly divided. If you think about this like a database, you can have rows and columns. The columns define a disk unit in a stripe, and the rows are the number of stripes available. If you have 12 disks, if you define 3 columns, you'll have 4 stripes; 6 would mean 2 stripes.
Interleave Size. I don't recommend you change this without studying the value/impact more intently. Your storage subsystem/provider can significantly change the 'value' of this value.

There are a couple of major concessions in this design. I strongly recommend you test somewhere safe before running any element of this script in production. I've made an assumption that no other disks are present by running (get-physicaldisk -canpool $true). If you need to filter this, I recommend piping to a WhereObject and defining by size or some other metric.

I hope this helps. As always, I welcome commentary and can be reached @sqlmadhadr!

Edit:

Some other relevant commands to know:

get-virtualdisk | format-list #- Shows important data like Interleave and NumberofColumns

Search This Blog

keeping data aloft in the cloud

Nursing critical IOPS and Throughput using Storage Spaces on Windows Server 2016

Comments

Post a Comment

Popular Posts

Using Powershell to Bulk Copy Data between SQL Server and Azure SQL

Windows Server in Azure: a surprise cautionary tale in cipher suites