Sunday, March 1, 2009

Beating the performance bogeys in the Intel SSDs

Intel changed the world forever with its new X25M and X18M SSDs.

But reports are circulating of major performance problems.

Everyone's puzzled, and Intel denies any performance problems at all.

What's the low-down?

It's simple, as far as I can tell.

First, let's review the kinds of problems being reported :

* Intel SSD is BLISTERINGLY fast when first installed, but after a few months, random write performance plummets.

* Testing the Intel SSD by hammering it with tiny random writes for even less than a few hours brings a brand-new SSD to its knees - write speeds lower than 10MB per second.

* Imaging a hard disk partition onto a brand-new Intel SSD, results in that SSD's write speed dropping to just a fraction of the advertised speed. (Search the linked page for 'All we did here was write the OS files')

You're not going to like this, but if you partition the drive - e.g. in half - and only ever use one of the two partitions, you won't see these problems.

Or at least, that's my bet, not actually owning one quite yet (although I've placed an order).

Can you see why?

The Intel SSD is so mind-bogglingly brilliant because it combines random writes into sequential writes. Others have already blogged about that very well, so I won't detail it here.

The problem is that the SSD doesn't know when virtual sectors are "available" again - it only knows which virtual sectors the operating system has ever written to.

If the operating system has EVER written to a virtual sector, the SSD controller studiously preserves the content of that virtual sector ever after, EVEN IF THE OPERATING SYSTEM LATER THINKS IT HAS DELETED FILES AND THAT THEREFORE THE SECTOR IS UNUSED AND AVAILABLE FOR RE-USE.

Thus, if you use the wrong imaging tool, and write an 80GB partition to your brand-new Intel SSD, BAM - as far as the SSD is concerned, you have 0% free space. Yup - your partition might have contained 60% free space, but if the imaging tool does a sector-by-sector copy of the source disk to the SSD, then the imaging tool will write to every virtual sector in the SSD, and the poor SSD controller will be left diligently trying to preserve the contents of empty sectors.

SO, the trick to ensuring that the Intel SSD controller has some "breathing space" in which to perform its magic, is to ensure you NEVER COMPLETELY FILL the SSD.

Sad, but true.

Yup - that means that if you fill your 80GB SSD with 80GB of data (whether test data for random write tests, or 80GB worth of your data files), then read speed will stay great (it does slow down, but you're not likely to notice without a performance monitoring tool), but write speed will plummet. The poor SSD controller won't be able to turn random writes into sequential writes, 'coz there's no free space!

Actually, there is free space - a tiny bit. See, apparently, the Intel SSD comes with a few GBs spare storage space beyond the advertised capacity.

Without this spare storage space "up its sleeve", the Intel SSD could, in worst-case performance scenarios, have random write speeds as abysmally low as say the OCZ Core V2 (which I am suffering with).

So by having more storage space available than it lets the computer directly use, the SSD controller always has a little bit of room to move - and that's why even when things go horribly pear shaped, the Intel SSD can still maintain a random write speed of well over 1 megabyte per second (reports vary up to even still around 20 megabytes per second) - which is WAY more than the competitors do from the very outset!!! (For comparison, my OCZ Core V2 120GB has a random write speed for 4kb blocks somewhere under, well, I'm not entirely sure, but I think less than 100 kilobytes per second - i.e. less than 0.1 megabytes per second.)

So let's take the case of the hardcore random-write test. We write a special program that randomly chooses any sector across the entire 80GB of the SSD, and writes just to that sector, and then chooses another randomly, and writes to that, and does so as fast as the SSD can handle it, for hours on end.

For the first 80GB-worth of random writes, it zooms along at the mind-boggling speed of 70MB per second or more.

But it quickly hits a wall. The SSD controller was attaining these astounding speeds by combining the random writes into sequential writes, but shortly after 80GB worth of writes, it runs out of unused sequential storage locations. (Remember that it has a bit of breathing room, but that gets used up quickly too...)

In this condition, the effort required to combine even just two random writes into one sequential write is multiplied. On the virgin drive, it could cram 2MB of random writes into each 2MB write block (or whatever write block size it actually uses - I pick 2MB for demonstration purposes, but the principle remains true whatever the actual write block size). But on the drive in this random-write-crammed-full condition, if there was no "breathing space" (additional storage space beyond what the host computer is aware of), then each 4 kilobyte (0.004 megabyte) random write would require an entire 2 megabyte write block to be written. That's a slowdown factor of 2 / 0.004 = 500!!!!!!! No wonder people report speed problems!

Of course, Intel was smart to include the "breathing space" - it ensures that the worst-case random write speed never actually gets as bad as that.

SO - what's a man to do?

The reality is, this problem is not as bothersome as you might think - if you know what triggers it and how to avoid it.

The simplest thing to do - and what I plan to do as soon as mine arrives - is to give it even a bit more breathing space than comes built-in.

e.g. partition it into 70GB + 10GB, and never touch the 10GB. That gives the controller an extra 10GB of "breathing space". In a more extreme case, if you - for example - chopped the drive in half, and kept 40GB in an untouched partition, where the SSD controller can have a huge amount of "breathing space", then you should find that the worst-case random write performance never drops below roughly half of the best-case random-write performance.

So why is it that Intel claims they can't reproduce the reported problems? I bet they're not testing with a chokkers drive. Pack that drive full - every last sector - and you'll get random write speed issues double-quick time. But leave more "breathing space" than the default built-in, and things will improve.

Does it seem sad to lose a big chunk of an already-very-expensive-per-gigabyte SSD?

It sure does. But on the flipside, do you need all 80GB of super-speed goodness?

In my case, I reckon I can survive with say just 40GB or 60GB of primary partition, tools, and source code, and keep my photo libraries and downloads and other huge files on a traditional spinning platter.

But do I even care that much? Suppose random write speed dropped to 10MB/s? Given that it has practically zero seek time, and given that developing for the Microsoft .NET Framework results in HUGE numbers of small relatively random writes, 10MB/s sustained on an SSD will probably still out-pace my current hard disk by a good margin.

I think I'll set aside 20GB as unused additional "breathing room" initially, and then chip into that later if I can make more precise calculations about worst-case random write performance with different amounts of breathing room...

So talk to me - did it work for you?




UPDATE: If your Intel SSD has hit the doledrums (i.e. has lost its random write performance), check out the HDD Erase tool and accompanying notes.

2 comments:

Dario Bertini said...

interesting...

i think you're right... but AFAIK, all of the OS way of see/represent the data on the disk is completely ignored by the ssd's chip...

so, by leaving unused space, even if it wasn't ever written to, i think that to it doesn't correspond any spare block in the flash memory

Theune said...

Anand's take on your idea:

"Intel ships its X25-M with 7.5 - 8% more area than is actually reported to the OS. The more expensive enterprise version ships with the same amount of flash, but even more spare area. Random writes all over the drive are more likely in a server environment so Intel keeps more of the flash on the X25-E as spare area. You’re able to do this yourself if you own an X25-M; simply perform a secure erase and immediately partition the drive smaller than its actual capacity. The controller will use the unpartitioned space as spare area."

http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=9