« I Love User Conferences | Main | I Blame the Weather »

August 20, 2008

Ray Day Pay

Pardon?

Accent is a funny thing; listening to a NetApp presentation the other day, I swear I heard someone say RAY-DAY-PAY. Cloth ears perhaps, but it took a full second for me to get it; RAID-DP. Got it...

It's RAIDing Cats and Dogs

I've been revisiting a paper I wrote on a competitor's implementation of RAID-6, and it struck me that there are a huge number of seemingly endless variations in RAID systems employed by the industry; everything from RAID-0 through RAID-50 and all stops in between.

This has reminded me that NetApp has only two RAID configurations as opposed to the myriad RAID configurations supported by other vendors; RAID-4 (single parity on a separate drive, not used that often and I'm not going to discuss it here) and RAID-DP, a variety of RAID-6 (dual or double parity).

So what's different about NetApp's choice of RAID level, why did we choose this form of RAID, and does it matter? And where is RAID-DP the same as or different from RAID-6?

In this blog entry, I'm going to explain the basic mechanics of RAID-DP, and will come back to the details in a later blog.

Disks Fail

Obviously. They're electro-mechanical devices. Even SSD (solid state disks) can fail too, so RAID, or the ability to protect against a disk failure, is essential. I'm not going to explore the whys and wherefores of drive reliability. NetApp, with its huge historical database of information on disks and storage systems, does a lot of research and publication in this area.

Drive reliability is measured over a huge number of disks, and disk failure is happens. The more disks you have, and the bigger the disks, the greater the problem becomes. Imagine petabytes of storage; we have customers with multi-petabyte systems where disk failures, due to the sheer number of disks, are a regular occurrence. It's the cumulative effect of having tens of thousands of them.

So if one R in RAID (where the R stands for Redundant) is a good idea, two must be better, right?

Statistics 101

Don't stop reading! It's really simple. If the chance of throwing a six on a dice is 1 in 6, what's the chance of throwing two sixes on two die? 1 in 36. Easy.

RAID-6, dual parity schemes, decreases your chances of losing data by orders of magnitude. If the chance of a disk failure is 1 in a million hours (10^6), what's the chance of two disks failing together? 1 in a trillion hours (10^12).

Yes, yes, I know it's not that straightforward, but you get the picture. If one set of parity is good, then two parity disks' worth are much much better at protecting against data loss. But if the lifetime of disks is so great, why bother with more than RAID-5? Or any of the other single parity schemes?

The advantages of dual parity are in robust data protection; specifically, RAID-6 can sustain two simultaneous drive failures in any RAID group without loss of data.

RAID-5 Can Damage Your Wealth and Health

Yes, RAID-5 is dangerous; anyone running RAID-5 on large 1TB drives (and they're getting larger) is running a serious and measurable risk.

The likelihood of one drive failing for some reason and, say, a single-bit media error on another during RAID reconstruct has increased to levels that make single-parity systems much more likely to suffer catastrophic data loss in everyday operation.

Especailly worrying is that these kind of drives are being used for archive, backup and recovery, and DR purposes. Even for production data; for example, home directories.

RAID-5 has had its day, and much better, more robust protection is essential, not simply a nice-to-have.

RAID-6 Dual Parity Schemes

SNIA, the storage industry's trade organisation, describes them like this;

Any form of RAID that can continue to execute read and write requests to all of a RAID array's virtual disks in the presence of any two concurrent disk failures. Several methods, including dual check data computations (parity and Reed Solomon), orthogonal dual parity check data and diagonal parity have been used to implement RAID Level 6.

SNIA doesn't differentiate between RAID-6 types, although their characteristics can be quite different in operation. 

Difference 1; Dedicated Parity Drives

In most implementations of RAID-6, the parity, like RAID-5, is spread across all the disks. In NetApp's version, the parity drives are distinct and separate.

Difference 2; Parity Calculations

Most implementations of RAID-6 use a technique called EVENODD. NetApp use a different, slightly more compute efficient technique, called RDP or  Row-Diagonal Parity. Designed by NetApp, RDP has some other distinct advantages over EVENODD.

Difference 3; Data ONTAP and WAFL

Finally, the last difference is the way that a NetApp system, with ONTAP and WAFL, use RAID-DP to advantage.

Ray Day Pay Positively Affects...

The details are all a bit deep (but I'll keep it reasonably understandable). The different methods for calculating and storing parity, and the way that ONTAP and WAFL manage RAID groups give a distinct advantage to our unique implementation of RAID-6.

RAID-DP is actually better than traditional RAID-5 or RAID-6 implementations. It's possible to get more out of RAID-DP -- more performance, more capacity and more data protection. Provably so, and all together that means reduced costs.

More Ray Day Pay next time!

.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/2345678/32569936

Listed below are links to weblogs that reference Ray Day Pay:

Comments

This reminds me. A while back I wrote a piece for Robin Harris' StorageMojo blog here:

http://storagemojo.com/2007/02/26/netapp-weighs-in-on-disks/

Apparently one of my prognostications about RAID5 has taken on a life of it's own! :-)

http://preview.tinyurl.com/5gkazh
http://tinyurl.com/5gkazh

Well, how's that for a meme?!?! http://en.wikipedia.org/wiki/Meme

Definitely gets a mention in Ray Day Pay Par Too.

Great article. Would be interesting to see some percentage data on the number of actual systems running this form of RAID-6 in production today. Must be unmatched in the industry!

This might be a good time to reflect on the old saying "All hardware eventually fails, all software eventually works".

Most errors that generate services calls these days are software or firmware issues, not flat out hardware failures. That said, there's no reason not to manage your hardware failure risk with a good raid setup.

Overall, good writeup. I used to have a bias against fixed parity drive raid 6 until a friend of mine at Netapp set me straight about the unique way you destage data to your disks.

Thanks, OSSG; part 2 may be a wee bit delayed as I'm off on holiday for a week. Lots of other advantages to RAID-DP I'll dicuss then.

Post a comment

If you have a TypeKey or TypePad account, please Sign In

© NetApp, Inc.  |  "Safe Harbor" Statement