After the Virtual Fireworks
When I blogged recently on NetApp 50% Virtualization Guarantee Announcement I suggested that
It’s going to be interesting to see the responses to this announcement.
It was. There was considerable reaction. One blogger (not a vendor, but a user) likened it to lighting the blue touchpaper and watching the fireworks.
And one big point in many people’s minds is why we chose to make our comparison to RAID-10, rather than RAID-5, or why we chose not to make a comparison with RAID-6.
Those are good questions, and they deserve an answer.
RAID-10 vs RAID-DP
NetApp understands that the bulk of disk deployments has been RAID-5 for a while. The driving force behind the adoption has been cost reductions – RAID-10 is often just too expensive and most applications simply do not need or use the performance increases it gives over RAID-5.
However, performance is not reason we insist upon RAID-10 as the comparison disk configuration – it’s data protection.
RAID-5 simply does not measure up to today’s realities. We are looking to compare based upon what is actually required today, and not yesterday’s best practices. We believe that vendors promoting single disk parity schemes are not looking after their customer’s best interests, and this is at the heart of matter.
Today, many people are driving to lower costs further – and the turmoil in the economy is going to greatly accelerate this activity. Companies are looking at SATA drives and large format FCP and SAS as part of the answer to lowering the costs of their storage infrastructure.
But as drive formats get larger, the likelihood of suffering data loss and an outage due to double disk failure or block read errors during a rebuild are much more real.
If you have a look at the math in this IBM Best Practices for Exchange technical paper, it gets pretty obvious that RAID-5 is a new recipe for disaster and storage vendors need to provide much better guidance for their customers.
- Over a period of 5 years, RAID-5 is 4,000 times more likely to suffer from data loss than RAID-DP
- Over a period of 5 years, RAID-10 is 163 times more likely to suffer from data loss than RAID-DP.
To be honest, the question is this – why is your storage vendor not insisting on double disk protection?
RAID-6 vs RAID-DP
So why not compare to RAID-6? The truth is, we could and would, but we don’t find our competition selling it. It’s there on the spec sheet as a solution from some vendors – not all have the capability -- but it’s rarely put forward as a viable solution.
[Update 10 Oct: StorageZilla now reckons RAID-6 is ready for the prime time, and StorageBod (a user who comments below) also reckons EMC promote RAID-6 actively. I asked several colleagues whether they ever see EMC propose RAID-6 when they’re competing, and the answer was a resounding no.. Unscientific I know, but I’ll stick by my contention that there’s not much EMC RAID-6 out there in comparison with RAID-5 or 10.]
Most, if not all, competitive RAID-6 implementations significantly reduce capacity and performance for increased data protection. That reduced capacity and performance is something that NetApp systems with RAID-DP don’t suffer from.
[Update 10 Oct: And I’m sticking by this one too. Again, see the comments.]
Our space guarantee for VMware makes that commitment for capacity. And our performance benchmarks, both SpecSFS for NAS and SPC for SAN, are all done with RAID-DP.
No Silver Bullet
But the RAID implementation is just part of the solution. There is no one silver bullet, and no matter how good any one piece of the solution is, for best results, our customers need to implement them all as explained in the program details.
This program is not simply about RAID, but storage efficiencies and the multiple technologies we bring to bear on the problem of providing superior capacity utilization while protecting data.
To quote my colleague Nick Triantos;
Once customers see the other advantages we have, in terms of performance efficient snapshots, rapid and efficient cloning and provisioning techniques, rapid data backup and recovery, and un-compromised data protection, they will realize a whole new way to manage their storage in a VMware environment.
Photo: Creative Commons Attribution 2.5 Steven H. Keys and KeysPhotography.com
My thanks to Mike Shea, NetApp, for the text and analysis.

Of course there would be fireworks! The only real objection I have after all of the fun.
I'd be willing to PAY for NetApp Professional Services for migration to the NetApp option and complete implementation.
However, if the CAPACITY"guarantee" (which seems very feasible based on the requirements), doesn't also meet my previous RAID10 based PERFORMANCE, I want NetApp Professional Services to put me back the way I was, and a full refund of Services, Support, and Product costs.
Posted by: Steven Schwartz - The SAN Technologist | October 07, 2008 at 03:07 PM
It's a miles-per-gallon guarantee, not a miles-per-hour guarantee.
Can you point me to any benchmarks comparing RAID-5 with RAID-10 on the same kit, or RAID-6 vs RAID-10? Do Dell publish any figures, either internal or external? I'd be interested to see what kind of difference you're claiming for RAID-10.
However, that's not the point. Don't want the capacity guarantee? Don't take it. What's so difficult to understand about that?
To repeat: Once customers see the other advantages we have, in terms of performance efficient snapshots, rapid and efficient cloning and provisioning techniques, rapid data backup and recovery, and un-compromised data protection, they will realize a whole new way to manage their storage in a VMware environment.
Posted by: Alex McDonald | October 07, 2008 at 03:49 PM
I disagree on several points. First, just because raid DP is closer to raid 10 than 5 in terms of reliability does not mean that you can use it as a comparison for vmware deduplication ratios. Raid 10 can only use half the drives allocated to it, so saying that raid DP will be twice as efficient is like saying nothing.
Now, I've seen a filer using dedupe get over 2x compression for vmware in the wild compared to raid 5 and raid 6- that was impressive. Unfortunately, comparing the efficiency of dedupe to a raid 10 does not really prove anything, and does not do justice to this feature.
Posted by: Open Systems Storage Guy | October 08, 2008 at 09:48 PM
EMC et all have been more than happy to talk to us and push RAID-6 for larger drive types. We utilise it for 500GB+ drive types; although for some 'performance' test workloads; we may drop to RAID-5 to give us a more consistent I/O profile against production workloads. But this is only after talking the users through the implications to their testing availability.
It maybe that they know that pulling the wool over my eyes would be silly because of my superior intellect or it may be that actually they are quite happy to talk about it? I like the idea of the former but I suspect the latter.
And Snapshots are not back-up BTW! They can form a useful part of a back-up strategy but when your array has just caught fire or you have lost your 'primary disks'; might not be useful.
Posted by: Martin G | October 09, 2008 at 02:37 AM
So where does the EMC white paper you reference state that Raid6 "significantly reduces capacity and performance" ?
Posted by: cleanur | October 09, 2008 at 07:40 AM
@OSSG; I take your point. But many vendors do not have RAID-6, but employ RAID-(mirrored) to achieve acceptable data protection. This is not just about making a comparison with the few vendors that support dual parity RAID.
@Martin; Your point is noted. I've also just seen StorageZilla make the same point, and I'll update the blog entry to reflect that.
@cleanur; the EMC RAID-6 doc states (page 4)
Posted by: Alex McDonald | October 09, 2008 at 08:44 AM
@Alex; no matter how you spin this, you're not going to get over the fact that nobody uses raid 10 for vmware. In fact, the only time I've seen an admin use raid 10 was for a oracle database, and he needed the performance more than the reliability.
Outside of Netapp, vmware almost always gets put on a raid 5, even if there is a raid 6 available. Performance and drive utilization are more important than reliability of the raid. If you want to compare apples to apples, claim that on a Netapp raid DP, you'll get better utilization than a raid 5 with a hot spare.
Lastly, I'm not banging this drum to make your life harder- I am simply playing devil's advocate. If I were to talk to an admin about your guarantee and had to tell them that the comparison system is a raid 10, they would think that you are trying to create a scenario where your only differentiation is the raid level, when in reality, you're trying to highlight a dedupe feature.
Posted by: Open Systems Storage Guy | October 09, 2008 at 10:34 AM
"There is a hardware cost (extra drive's worth of parity) and a performance cost associated with implementing RAID-6."
That's the case for anyone running dual parity raid. You'll need a second parity drive and will also incurr more CPU loading for parity calculations and I/O throughput at the backend. Dependent on the design and flavour of implementation performance may vary, but everyone incurrs the second parity drive penalty regardless.
The statement you referenced above from the EMC doc is hardly equivalent to your interpretation that Raid6 "significantly reduces capacity and performance". if that's the case then anyone could easily level the same accusation at any dual parity scheme.
I suppose having the ability to offer multiple Raid levels to their customer base, allows some of the vendors to be a little more flexible and candid around best usage cases.
Posted by: cleanur | October 09, 2008 at 02:58 PM
@OSSG: I do understand where you are coming from, but it's not soley about dedupe. Dedupe is just one of the methods we use to reduce space consumption. Dedupe, RAID-DP for protection, thin provisioning and best practice are all parts of the package; the documents make this quite clear.
So would it make any difference if I suggested we might get 20% against no RAID at all? Or 35% against RAID-5? Just because of dedupe? Would anyone have complained? Probably not (well, excluding some loud noises from the ususal suspects). But it would have been a dedupe focus, which is not the sole criteria for buying a NetApp system.
But because it's 50% against RAID-10 to draw out the benefits, we're in someway spinning? Sigh.
Posted by: Alex McDonald | October 09, 2008 at 03:33 PM
@cleanur:
But not RAID-DP, because it doesn't work like RAID-6.
RAID-DP never updates already written blocks, but stripes writes on new blocks across all the drives. Parity (not the evenodd scheme, btw) is calculated at the time the stripe is written, once per stripe. There is no need to go back and recalculate the parity if a block is updated, because it never is. RAID-DP can use dedicated parity drives for that very reason.
Other distrubuted parity schemes that are basically RAID-5 with an extra diagonal parity need to calculate and write the parity blocks, even when only one block is being updated. That's why they use distributed parity; dedicated parity drives would get incredibly hot and throttle bandwidth pretty quickly.
Random write and parity block writes put enormous pressure on cache for systems with RAID-6 groups up at the RAID-DP 14+2 level, and add a lot more IO to the mix.
RAID-DP may be like RAID-6 in that it has two parity drives and can survive double disk failures. The similarities end there. With RAID-6 you can't have both high performance and high usable capacity. With RAID-DP you can.
I did promise to explain all this in much more detail, way back. I must get around to it!
Posted by: Alex McDonald | October 09, 2008 at 04:11 PM
Alex,
I've read Raid-DP whitepapers so have a good idea on how Raid-DP functions. The point was that you completely mis represented what the EMC document actually said about Raid-6. The fact is all dual parity schemes require a second parity drive including yours. Also all dual parity schemes have a performance overhead including Raid-DP, although stated as negligible (depends what your measuring I suppose). I've seen it mentioned in your own whitepapers, which means the same accusation could be leveled at Netapp. Namely that Raid-DP "significantly reduces capacity and performance". Not really fair I know, but it all depends on the interpretation.
Posted by: Cleanur | October 10, 2008 at 05:35 AM
It's not my intention to misrepresent EMC's RAID-6, but the document is hedged about with so many ifs and buts, it's clear that EMC have had trouble in both pigeon-holing this one against the myriad other RAID types they support, and demonstrating that it's an acceptable -- or better -- alternative to RAID-5.
There is no clear recommendation anywhere in it what RAID-6 group sizes to use; what stripe element sizes to use; or what performance characteristics are in relation to RAID-5 with a similar number of drives. StorageZilla's blessing of RAID-6 is no more than that. Where are the recommendations and best practices?
He's already taken me to task about the fact that the only (numberless!) tests of RAID-6 in the document are with cache turned off. Of what use is that? To properly assess the impact on capacity and performance, tests need to be run at much larger group sizes and with cache on, and some numbers for consumption too.
We have our recommendations; RAID-DP at 14+2. And we have our measurements; SPC-1 benchmarks. This document's lack of hard data points forces me to the conclusion that EMC's RAID-6 significantly reduces capacity and performance.
Posted by: Alex McDonald | October 10, 2008 at 06:26 AM
Alex... Any chance you can link to the IBM red book again? The link in the article appears to be broken (I get a no such directory response) and some quick googling failed to locate it.
Posted by: Scott Waterhouse | October 22, 2008 at 11:50 AM
Scott; ftp://service.boulder.ibm.com/storage/isv/NS3574-0.pdf. There was an extra space in the link, which I'll correct. Thanks for the note.
Posted by: Alex McDonald | October 22, 2008 at 03:00 PM
Thanks Alex, that works. Now I just wish I was a better statistician!
Posted by: Scott Waterhouse | October 23, 2008 at 01:58 PM