Pardon?
Accent is a funny thing; listening to a NetApp presentation the other day, I swear I heard someone say RAY-DAY-PAY. Cloth ears perhaps, but it took a full second for me to get it; RAID-DP. Got it...
It's RAIDing Cats and Dogs
I've been revisiting a paper I wrote on a competitor's implementation of RAID-6, and it struck me that there are a huge number of seemingly endless variations in RAID systems employed by the industry; everything from RAID-0 through RAID-50 and all stops in between.
This has reminded me that NetApp has only two RAID configurations as opposed to the myriad RAID configurations supported by other vendors; RAID-4 (single parity on a separate drive, not used that often and I'm not going to discuss it here) and RAID-DP, a variety of RAID-6 (dual or double parity).
So what's different about NetApp's choice of RAID level, why did we choose this form of RAID, and does it matter? And where is RAID-DP the same as or different from RAID-6?
In this blog entry, I'm going to explain the basic mechanics of RAID-DP, and will come back to the details in a later blog.
Disks Fail
Obviously. They're electro-mechanical devices. Even SSD (solid state disks) can fail too, so RAID, or the ability to protect against a disk failure, is essential. I'm not going to explore the whys and wherefores of drive reliability. NetApp, with its huge historical database of information on disks and storage systems, does a lot of research and publication in this area.
Drive reliability is measured over a huge number of disks, and disk failure is happens. The more disks you have, and the bigger the disks, the greater the problem becomes. Imagine petabytes of storage; we have customers with multi-petabyte systems where disk failures, due to the sheer number of disks, are a regular occurrence. It's the cumulative effect of having tens of thousands of them.
So if one R in RAID (where the R stands for Redundant) is a good idea, two must be better, right?
Statistics 101
Don't stop reading! It's really simple. If the chance of throwing a six on a dice is 1 in 6, what's the chance of throwing two sixes on two die? 1 in 36. Easy.
RAID-6, dual parity schemes, decreases your chances of losing data by orders of magnitude. If the chance of a disk failure is 1 in a million hours (10^6), what's the chance of two disks failing together? 1 in a trillion hours (10^12). ![]()
Yes, yes, I know it's not that straightforward, but you get the picture. If one set of parity is good, then two parity disks' worth are much much better at protecting against data loss. But if the lifetime of disks is so great, why bother with more than RAID-5? Or any of the other single parity schemes?
The advantages of dual parity are in robust data protection; specifically, RAID-6 can sustain two simultaneous drive failures in any RAID group without loss of data.
RAID-5 Can Damage Your Wealth and Health
Yes, RAID-5 is dangerous; anyone running RAID-5 on large 1TB drives (and they're getting larger) is running a serious and measurable risk.
The likelihood of one drive failing for some reason and, say, a single-bit media error on another during RAID reconstruct has increased to levels that make single-parity systems much more likely to suffer catastrophic data loss in everyday operation.
Especailly worrying is that these kind of drives are being used for archive, backup and recovery, and DR purposes. Even for production data; for example, home directories.
RAID-5 has had its day, and much better, more robust protection is essential, not simply a nice-to-have.
RAID-6 Dual Parity Schemes
SNIA, the storage industry's trade organisation, describes them like this;
Any form of RAID that can continue to execute read and write requests to all of a RAID array's virtual disks in the presence of any two concurrent disk failures. Several methods, including dual check data computations (parity and Reed Solomon), orthogonal dual parity check data and diagonal parity have been used to implement RAID Level 6.
SNIA doesn't differentiate between RAID-6 types, although their characteristics can be quite different in operation.
Difference 1; Dedicated Parity Drives
In most implementations of RAID-6, the parity, like RAID-5, is spread across all the disks. In NetApp's version, the parity drives are distinct and separate.
Difference 2; Parity Calculations
Most implementations of RAID-6 use a technique called EVENODD. NetApp use a different, slightly more compute efficient technique, called RDP or Row-Diagonal Parity. Designed by NetApp, RDP has some other distinct advantages over EVENODD.
Difference 3; Data ONTAP and WAFL
Finally, the last difference is the way that a NetApp system, with ONTAP and WAFL, use RAID-DP to advantage.
Ray Day Pay Positively Affects...
The details are all a bit deep (but I'll keep it reasonably understandable). The different methods for calculating and storing parity, and the way that ONTAP and WAFL manage RAID groups give a distinct advantage to our unique implementation of RAID-6.
RAID-DP is actually better than traditional RAID-5 or RAID-6 implementations. It's possible to get more out of RAID-DP -- more performance, more capacity and more data protection. Provably so, and all together that means reduced costs.
More Ray Day Pay next time!
.

This reminds me. A while back I wrote a piece for Robin Harris' StorageMojo blog here:
http://storagemojo.com/2007/02/26/netapp-weighs-in-on-disks/
Apparently one of my prognostications about RAID5 has taken on a life of it's own! :-)
http://preview.tinyurl.com/5gkazh
http://tinyurl.com/5gkazh
Posted by: Val Bercovici | August 20, 2008 at 06:40 AM
Well, how's that for a meme?!?! http://en.wikipedia.org/wiki/Meme
Definitely gets a mention in Ray Day Pay Par Too.
Posted by: Alex McDonald | August 20, 2008 at 07:29 AM
Great article. Would be interesting to see some percentage data on the number of actual systems running this form of RAID-6 in production today. Must be unmatched in the industry!
Posted by: Geert | August 20, 2008 at 01:17 PM
This might be a good time to reflect on the old saying "All hardware eventually fails, all software eventually works".
Most errors that generate services calls these days are software or firmware issues, not flat out hardware failures. That said, there's no reason not to manage your hardware failure risk with a good raid setup.
Overall, good writeup. I used to have a bias against fixed parity drive raid 6 until a friend of mine at Netapp set me straight about the unique way you destage data to your disks.
Posted by: open systems storage guy | August 21, 2008 at 11:49 AM
Thanks, OSSG; part 2 may be a wee bit delayed as I'm off on holiday for a week. Lots of other advantages to RAID-DP I'll dicuss then.
Posted by: Alex McDonald | August 22, 2008 at 09:01 AM
Sorry, but this looks, even for a NetAPP fan like myself mostly like a big amount of FUD.
- Raid4 has been standard in OnTap for ages till the nearstore IDE disks called for raid-dp and you call it "rarely used" - where have you been before Raid-DP was introduced and can you supply numbers how many people use Raid-DP with FC disks? I suppose almost noone actually.
I don't see why the myriad of Raid levels in the storage industry has to be a disadvantage per itself. Of course it tends to be useless to optimize on that level if (and only if) you got an nvram card in your box and can optimize far beyond a normal controller's means; but still - whats the problem with having options?
Reminds me of that note in /etc/rc of our filers long ago:
"any color you like as long as it's lack"
Still, I didnt even wanna read your technical bits after such an ignorant introduction. This is what I hated about Suns NetAPP-hating blog posts and _so far_ NetAPP had appeared to have much more sensible employees.
Posted by: darkfader | March 19, 2009 at 07:49 AM
I do have figures, and RAID-DP accounts for the bulk of our user base. Across a sample of 7000+ systems with 128,000 RAID groups surveyed from AutoSupport data, 75% are RAID-DP. From the same data 94% of the volumes created in the last 12 months were RAID-DP.
It's now the standard recommendation for NetApp systems, even for FC drives, due to the fact that it's equally space efficent (14+2 as opposed to RAID-4 7+1) and performs to within about 1 to 2% of RAID-4.
RAID options that force you to choose between performance, capacity or reliability I don't really see as options.
As I don't reference or bash any competitor here, I'm disappointed you take me to task for it. I'm equally surprised you think me ignorant of how our customers use our systems. Hopefully the numbers I give here and the rest of the blog entry will help you understand why RAID-DP is to be preferred over any other RAID.
Posted by: Alex McDonald | March 20, 2009 at 05:18 AM