Running out of IOPS

Users couldn’t log in. It was Monday morning, and I managed a courseware system for a large institution—180K users. Most of them were getting timeouts when they tried to log in, and I had to fix it.

I discovered that the database’s disks had become the bottleneck causing the problem. It wasn’t yet clear why this had become a problem. Just a couple of weeks ago, performance had seemed fine with plenty of margin.

Now, the database instance’s disks were up against their IOPS limits and had long queue times. We considered upgrading the disks to a more expensive, provisioned-I/O class. They would cost 10 times as much. The disk array was fairly large, and we weren’t quite ready to eat that expense, so I kept investigating.

Finally, I found what I needed. Linux was prioritizing a RAID consistency check. It should have been a background task, but the default I/O scheduler was letting the RAID check starve the database’s workload.

I made one small change. The next time the check ran, nobody even noticed. The system was fast again, and we saved a bundle of money on our cloud hosting bill.

Setting I/O schedulers manually

You could just set the scheduler directly. It’s easy to do. Say you want to set the scheduler for /dev/sdc to Kyber:

echo 'kyber' > /sys/block/sdc/queue/scheduler

Done—until the machine needs to reboot. So, make a systemd unit to do that on startup.

Except, the device names of your extra disks aren’t stable between reboots. Your filesystems have UUID’s, but those might correspond to partitions instead of the disks. So, you lookup the serial number for the disk and use that.

That works, until a drive detaches and re-attaches, or something causes the SCSI bus to re-enumerate everything. Now, you need to set it again between reboots. Maybe a cronjob?

Great. But that’s clunky and easy to lose track of, and changing the scheduler 5 minutes after attach instead of immediately causes a stall while all the current I/O flushes.

You could learn how to write the proper udev rules, make sure they interact with the OS-supplied rules correctly, and get them distributed to all the machines, with the right drive identifiers.

And then you’ll occasionally need to change your policy, and have to do it all again.

You really want a single tool that handles all the hassle for you, and gives you easy control.

That’s exactly what I/O Manager does.

I/O Manager makes it easy

I/O Manager lets you set which I/O scheduler should be used for each drive. As VMs and drives come and go, and attach or detach, it will keep the preferred scheduler synchronized. You can deploy it from AWS Marketplace directly to your own AWS cloud environment.

Rules-based policy

It uses a simple rules engine that allows you to set each drive’s scheduler based on the attributes of the drive or the machine that it’s attached to.

A reactive engine

As drives and machines come and go, or as their attributes change, I/O Manager automatically updates the scheduler according to your rules.

The rules engine reacts live to changes in rules, instances, volumes, and attachments. Anytime instance or volumes are created, modified, tags changed, instances restarted, I/O Manager will re-apply the policy specified in your rules to the latest state information, and propagate any needed scheduler changes, live.

This video gives a demonstration:

Identifies drives reliably

I/O Manager intelligently selects the best way to identify each attached volume, accounting for the capabilities of both AWS and the instance’s running kernel.

This ensures that your preferred I/O schedulers are consistently applied to each drive, regardless of volume capabilities, kernel versions, or system reboots.

Conflict-free editing

Multiple users can edit rules simultaneously. Each user gets their own scratch space to make changes in.

When changes are committed to the master rules, other users’ scratch spaces are updated to reflect their intended changes against the latest version of rules.

Easy deployment

I/O Manager is available in AWS Marketplace and deploys directly to your own AWS cloud environment. This gives you easy management of your subscription, control of permissions boundaries, and simple single-point billing.

Cost-effective control

I/O Manager costs only 2% of the cost of the volumes that you ask it to set a scheduler for, plus minimal AWS infrastructure costs for hosting the tool.

For example, if you have $1000/month of volumes, and set non-default scheduler policies for $800/month of those, I/O Manager will bill your account for $16/month. (Note that AWS bills you directly for the hosting infrastructure.)

I am eager to help you manage your AWS resources. If you have any difficulties using I/O Manager, you can file a support ticket and I will work with you directly to resolve the issue.

If you decide that I/O Manager doesn’t fit your needs, you can unsubscribe at any time.

Stop wasting money

Stop paying for expensive provisioned disks if you don’t need to, and don’t settle for mediocre performance. Set the right scheduler for each of your workloads to get the most out of your cloud VMs, and let I/O Manager make it easy for you.