Performance as a measure of usable capacity
EMC’s StorageAnarchist took the bait and asked the logical question I was hoping he would when I blogged on the relative complexity of calculating usable capacity on feature rich and resilient enterprise storage arrays. He was at a disadvantage though, because he is operating under the spell of EMC’s competitive FUD which is (sadly for him) far removed from customer reality.
A simple question deserves a simple answer (even if the questioner won't like it)
I do owe the Anarchist thanks though, for picking such nice round numbers to work with, since that will help. To summarize, basically he wanted to know what 100TB raw (200 x 500GB disks) would look like as usable under a true enterprise (OLTP) workload. As a matter of fact, his question has actually been asked and definitively answered for both of us earlier this year.
The answer is at least 62TB usable (conservatively, see better number below) for NetApp and 38TB usable for EMC (as well as most of the rest of the mainstream primary storage vendors).
In January 2008, NetApp published a benchmarked result for a proven OLTP workload agreed upon by most of the enterprise storage industry. This public benchmark result was rigorously and independently audited. As with all highly respected and credible benchmarks, it’s open to peer review by experts from competing vendors in our industry.
So, what's the catch?
All advanced data protection and provisioning features
(RAID-DP/6, Advanced Checksums including “disk remanufacturing” and
“lost writes” detection/correction, Snapshots, Thin Provisioning, etc…)
were enabled as part of this benchmark test. NetApp also ran
this workload at the same consistently high level of performance for a
highly extended (4+ week) period in the benchmark lab, then took the unprecedented step to publish that result as well in a separate report which covers the final 48 hours (as per the existing limits of the benchmark logging tools).
The end result was 62% capacity utilized (75% w/snapshots) at full performance over 140 spindles on our most popular mid-range modular controller. EMC required more spindles (155 to our 140) yet only max'ed out at 38% capacity efficiency under that demanding workload.
Key facts
From a capacity efficiency perspective, what’s notable about the NetApp number is the following:
- It’s over double the industry average of 30%
- It was accomplished with capacity-efficient (double) parity RAID instead of capacity-hostile conventional RAID 1/0 mirroring & striping
- It’s over 68% better (62 / 38) than any independent result for a comparable EMC system (as an aside, this number can be directly extrapolated to most of the other modular storage arrays from other (i.e. HP EVA, HDS AMS, Dell PS3x00, LSI/Engenio 699x, etc…) vendors due to the RAID 1/0 capacity tax)
- It was the highest usable capacity published in over 5 years (when a RAID 5 result was still possible due to smaller drives)
- It was the only independently audited result to measure both (capacity-friendly) snapshot and thin provisioning performance independently as well as together
Finally, this benchmark was conducted with one hand tied behind NetApp's back compared to EMC. Because CLARiiON snapshots impose such a severe performance tax on response time and throughput, the snapshot portion of the benchmark was "detuned" in order to keep the CLARiiON in the fight. The NetApp FAS array was actually able to demonstrate 75% usable capacity with snapshots included, and likely more if not "detuned" against a legacy array.
In that case given that enterprise storage customers demand the ability to use premium features such as online instant recovery points & clones (i.e. snapshots & FlexClones) at high performance - Perhaps the better answer to the Anarchist's question is at least 75TB usable!
Exposing the capacity challenges of legacy RAID arrays
Anarchist – What fascinates me is the glass ceiling of 50% for any EMC
primary storage system you can provide to answer the same question. And
that’s a simplistic & generous estimate. Apart from the crushing
RAID 1/0 capacity tax, EMC systems must also low-level format drives,
store checksums, allocate hot spares and leave spare some capacity
because no storage system today can deliver max iops from every sector
of every disk drive. In fact the best EMC could do for an independently
audited number is 38% to our 62%. Don’t feel too bad Anarchist - at
least you beat the industry average :)
Time for a better EMC response
I’m awaiting your predictable response around SPC-1 not being a legitimate benchmark because EMC (counter to customer opinion as well as the rest of the storage industry) “just says so”. No doubt you’ll also heap (unsubstantiated by 3rd parties) praise around EMC’s new esoteric flash technology available to the well-heeled upper class of Symmetrix DMX customers who don’t yet realize CLARiiON CX systems will outperform a DMX on small-block random reads… but I digress.

Great overview. And it looks like the bar is set even higher now for the CLARiiON with the publication of SPC-1 results for the new FAS3170:
http://www.storageperformance.org/results/benchmark_results_spc1/#netapp_spc1
60,515.34 SPC-1 IOPS !!!
$10.01/SPC-1 IOPS
60% Total ASU
Posted by: Lee Razo | June 13, 2008 at 12:33 AM