A couple of weeks ago I focused on answering the following question:
And although the answer was interesting, the question was uninteresting. All I explained was that with more hardware we can perform better.
A much more interesting question is why does NetApp RAID-DP perform so well compared to other RAID-6 implementations.
Why is RAID-DP performance more interesting?
Regardless of whether you believe NetApp or EMC, RAID-6 is more resilient than RAID-5, and RAID-5 or RAID-4 is fundamentally unsound for very large disks.
So if RAID-5 or RAID-4 are unsound, why the angst about RAID-6?
Performance.
It turns out that RAID-6 performs worse than RAID-5 because every update to disk consumes more IOPS. If you recall from my earlier blogs, RAID-5 on a Real Fibre Channel array is expensive, because a single update to a single block on a single RAID stripe will turn into several read operations and two writes, one for the new data and one for the updated parity.
For RAID-6, you now have to do three writes, one for the data and two for the parity.
So, RAID-6 performance is worse, but why do I care?
The reason you care, is that if you need the performance and resiliency you need to use RAID-10.
Huh?
RAID-10 performance is excellent and offers much better resiliency than RAID-5 and on larger SATA disks is more or less the only alternative if you care about your data….
Wow…
If you need to get performance and resiliency, you need to buy 2x the disk capacity you require.
Kinda kills the entire value proposition of shared storage.
In fact if you have to deploy RAID-10 on the storage back-end, I really think that the right architecture you should be looking at is host-level replication with brick architectures.
Don’t believe me, then here’s the proof from the folks at Hopkington
Since EMC does not publish benchmarks, and I don’t have any RAID-6 arrays sitting around, all I can do is quote the EMC whitepaper (page 13):
RAID-6 write performance is less than that of RAID-5
and the reason
The extent of the performance varies depending on write sizes and RAID group sizes, but the extra parity information that has to travel on the back-end buses to disk results in lower system bandwidth for a system with all RAID-6.
This is why RAID-10 is recommended for any environment where performance is important.
So why is RAID-DP faster?
When it was first explained to me many moons ago, I was a little bit surprised. Yes RAID-DP is optimal, but a key part of the performance win is how WAFL does write allocation.
If you read my earlier blog, the reason WAFL performance with low utilization is so good is because we end up doing sequential full stripe writes and because finding new places to do sequential writes requires very little effort.
RAID-DP is just a natural extension of that approach. Instead of doing the parity calculation one stripe at a time we do it multiple stripes at a time. The efficiency obtained by not having to write in place means we get really good performance and avoid the RAID-6 performance penalties in Real Fibre Channel arrays.
Now it turns out that although we have to do more work at high utilization and that brings our performance in-line with Real Fibre Channel, what the picture doesn’t show is that we achieve that performance while doing RAID-6 or RAID-DP.
In fact our SPC-1 benchmark show how WAFL using RAID-DP has the same performance as a RAID-10 CLARiiON configuration.
So another way of looking at the picture from above is that our performance at high utilization still kicks ass because we require fewer spindles.
or
WAFL and RAID-DP provide better performance, with better availability with improved utilization than the alternatives.


