Back to technology: Is it effective to place Oracle REDO on SSD?

Ever since I was asked to improve the throughput of an actual general ledger posting job involving Oracle in December 1993 on some hardware where solid state disk (SSD) was available (at high cost relative to “spinning rust” or hard disk drives [HDD]), I have been trying to explain the overall advantage of placing different types of the different Oracle storage selectively on SSD.

When FLASH SSD arrived on the scene, studies quickly arose that writing to FLASH SSD is often not as fast as writing to disk drives dedicated to receiving those writes.

Today I’ll try to explain why I don’t care.

While there was some advantage to writing to SSD in my tests (which were to RAM based SSD on a VAX), the write speed to online REDO was not a significant part of the advantage of placing online REDO on SSD.

As Kevin Closson has repeatedly and carefully explained,  the write speed of online REDO is rarely the problem: http://kevinclosson.wordpress.com/2007/07/21/manly-men-only-use-solid-state-disk-for-redo- logging-lgwr-io-is-simple-but-not-lgwr-processing/

There are two things about moving online REDO to SSD (even the relatively slower FLASH kind) that are a big performance and cost advantage most of the time:

1) Mostly writes to online REDO are small and frequent. This generates a constant stream of seeks to find the correct place to write. On HDD that means either you have dedicated a chunk of HDD (usually two, four, or eight whole trays, because we stripe and duplex by tray in actual big systems and many folks insist on both hardware duplexing and multiple members of each REDO log group on storage that fails separately and you might need to Ping-Pong your REDO log groups so that REDO is written on distinct drives from where ARCH reads REDO) or you degrade the performance of the HDD containing the online redo for other purposes because you pester it with constant seeks away from the other work it is supporting.

2) Reads from online REDO are big drinks by ARCH which demands bandwidth. On HDD that means you either dedicate a chunk of HDD (as above, usually an expensive chunk) to the online REDO or you consume some of the read bandwidth from that HDD that would otherwise be available to all the oracle readers whenever ARCH is running.

Normally the required amount of storage acreage required for online REDO is modest.

Thus, the cost calculation for deploying online REDO on SSD should be for the size of SSD big enough to do the job (times two or four, perhaps for duplexing and multiple members, but never times eight because of seek irrelevancy on SSD) versus the cost of deploying the online REDO on isolated chunks of HDD if overall performance is an issue.

The central value of putting online REDO on SSD is to de-heat the rest of the disk farm.

Unless you are in a rare situation where writing to online REDO is your pacing resource and it is the pacing resource due to the write speed of the media (not available CPU or dimm channel speed and availability), the relative reduced speed to writing some kinds of SSD over writing to dedicated HDD presumably waiting to swallow the write at the correct seek location is of zero concern. (IF you are in that situation, it is probably time to invest in a small amount of RAM based SSD  [or you are doing a laboratory test just driving REDO, which is an interesting test not directly related to production throughput of most real systems.])

Let’s review: If your actual problem is the speed of writing to online REDO log or log file sync, you are not likely to solve that problem by moving online REDO to slower SSD. (There is some possibility that the concomitant de-heating of the disk farm may have that net effect, but you could also achieve that by isolating online REDO on independently operating HDD.)

On the other hand, if you have a hot disk farm that is the pacing resource to your throughput and you can remove a lot of the heat for a modest investment in SSD,  that is an effective use SSD.

The leap to the conclusion that moving online REDO to SSD is for the purpose of speeding up log writing or log file sync makes it seem like a laboratory test showing writing to some kinds of SSD being slower means putting online REDO on SSD is wrong.

I hope today I have helped explain why it is often a good investment to place online REDO on SSD.

Advertisements

About rsiz

Father and Husband, Oracle Technology Scientist and Consultant, planning to end poverty for citizens and legal US residents Lebanon, NH · http://www.rsiz.com See my wife's puzzles at thingamajigsaw.com
This entry was posted in Oracle, Thinking Clearly and tagged , . Bookmark the permalink.

4 Responses to Back to technology: Is it effective to place Oracle REDO on SSD?

  1. NancyL says:

    I remember that little throughput test. 😉

    • rsiz says:

      As I recall folks were pretty happy I got it fast enough as it was deployed but were even happier with the idea of compressing out local accounts by a factor of 10 in the corporate consolidation. So poof! went the somewhat expensive hardware fix to get it to run in one-third the time and in came the business logic solution to get it to run in one tenth the time, with no real loss of pertinent information. My recollection is that everyone was pretty satisfied with that solution.

  2. MIke Ault says:

    With the IBM AFA the write speeds have been mitigated to better than read speeds, so redo logs or any other data is ok to place on them. In fact many clients are going to an AFA for the entire database.

  3. One thing I’ve found makes a significant difference with redo write speeds – in modern hardware! – is to set “_disk_sector_size_override” to true and use a block size for redo that matches or is a small sub-multiple of the file system’s block size. In the concrete case I deal with most of the time, redo blksz 4K so it matches JFS2 blksz, for both ssd and spinning rust. Waits on redo write and log sync drop well below 1ms after that, consistently and with very little regard to the balance of I/O and very little redo wastage. Unfortunately one needs at least 11gr2 for all that. But as I’ve found – the hard way – 11.2.0.3 is actually quite stable for average db requirements.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s