Sustained IO on EBS == No Bueno

I have a small EC2 instance running with a 25GB EBS volume attached. It has a database on it that I need to manipulate by doing things like dropping indexes and creating new ones. This is on rather large (multi-GB, millions of rows) tables. After running one DROP INDEX operation that ran all day without finishing, I killed it and tried to see what was going on. Here’s the results of the first 10 minutes of testing:

-bash-3.2# dd if=/dev/zero of=/vol/128.txt bs=128k count=1000
1000+0 records in
1000+0 records out
131072000 bytes (131 MB) copied, 0.818328 seconds, 160 MB/s

This looks great. I’d love to get 160MB/s all the time. But wait! There’s more!

-bash-3.2# dd if=/dev/zero of=/vol/128.txt bs=128k count=100000
dd: writing `/vol/128.txt': No space left on device
86729+0 records in
86728+0 records out
11367641088 bytes (11 GB) copied, 268.191 seconds, 42.4 MB/s

Ok, well… that’s completely miserable. Let’s try something in between.

-bash-3.2# dd if=/dev/zero of=/vol/128.txt bs=128k count=10000 
10000+0 records in
10000+0 records out
1310720000 bytes (1.3 GB) copied, 15.4684 seconds, 84.7 MB/s

So the performance gets cut in half when the number of 128k blocks is increased 10x. This kinda sucks. I’ll keep plugging along, but if anyone has hints or clues, let me know. If this is the way it’s going to be, then this is no place to run a production, IO-intensive (100,000s and maybe millions of writes per day, on top of reads) database.

  • Mark Callaghan

    Do you plan to run the sysbench fileio test for concurrent 16kb random reads? I am curious about sustained throughput for random IO. My assumption is that there will be a bit of variation as you have shown for sequential IO.

  • PaulM

    Hey mate,

    For the raw device numbers go and look at this article

    I am going to run sysbench myself shortly, Amazon does mention that large and extra large instances with their better network will perform better.

    Have Fun

  • m0j0

    Thanks, PaulM

    I just tried to launch a large instance a short while ago, but apparently you *must* have an x86_64 image to launch anything bigger than a ‘small’ instance, which I don’t recall seeing in the docs, which is really annoying. Perhaps I just missed it.

    I did see that article, but also didn’t notice the raw device numbers. I’ll have another look – thanks!

    @Mark — I didn’t plan to run *any* benchmarks, to be honest, but when you know how long something should take, and it takes exponentially longer than that, you get… ‘inspired’.

  • coldfire

    in my experimentation with a large EC2 instance and a 600GB EBS volume, i have seen the iowait hitting 50% when doing simple operations like extracting a tarball and/or uploading files sustained at 20Mb/s to the EBS volume.

    i’m getting terribly worried that EBS won’t be sufficient to serve a lot of multimedia content from, but we’ll see once i have everything in place and have a chance to run some benchmarks.

    also attempted to create a snapshot of the volume when i had only 300gb of storage. that is running so dreadfully slow that i wish there was a way to just kill it before it completes. seems like it would be quicker to just create another volume, attach it, and copy the data. of course, then you don’t have that point in time snapshot … but oh well.