“People understand contests. You take a bunch of kids throwing rocks at random and people look askance, but if you go and hold a rock-throwing contest -- people understand that.” (Don Murray)
And that, in a nutshell, is the origin of most competitive sports. What starts as "Hey, I can chuck this boulder through that window over there -- betcha you can't!" develops through time a set of rules, referees, measuring devices, a governing body (sometimes two or three) and the sport of rock-chucking may even get recognised by many, played by many and make it into the Olympics. Like spear throwing (the javelin) or blasting birds out of the sky with a shotgun (clay pigeon shooting), rock-chucking has matured into shot-putting.
Just like benchmarks. Originally used to demonstrate some feature of your system in all its one-trick glory (HP IOPS from cache is a classic example of how to do this), modern industry benchmarks make the attempt to allow some real-world comparison between competitors. The relationship to real life is debatable, but the rules by which they operate are carefully designed to inject some aspects of realism to comparative claims.
Benchmarks are intended to be a repeatable test of a set of skills. But, as with all competitive sports, sometimes there's the kid who starts his benchmarking career hanging round on street corners stoning passers-by for entertainment, or hoisting up the nearest large rock at hand and heaving it through the closest plate-glass window.
And the new kid is over at HP, using a non-benchmark as a benchmark, and generally lobbing rocks around in all directions. I'm not going to dissect the post in detail, because others have and will continue to do that. There's one little paragraph I want to focus on, because it demonstrates one of my pet peeves; benchmark intuition.
Now things were starting to make sense. We were seeing the same sort of decay curve as shown in the IOMeter results posted in Making Sense of WAFL - Part 4. Every time the test is run, the random component of the Jetstress database accesses fragment the LUN further and the throughput numbers get worse. An array like EMC CX or HP EVA wont undergo this sort of decay curve since these arrays do not have internal WAFL-fragmentation problems like the FAS does.
The non-benchmark is Microsoft's ESRP, and the tester's intuitive assumption is that WAFL fragments; hence the tester's intuitive assumption that this is the source of the diminishing throughput numbers.
Bzzzt. Big fail.
Let's allow the first intuition; let's allow, for the sake of this demonstration, that "WAFL fragments your data". Here's a simple example to demonstrate why his intuition is wrong on fragmentation being the source of the problem. Exchange 2007 generates small random IOs (and that's the JetStress that HP are using in their test). The table below has 5 columns to demonstrate why small random IO works just as well (or badly, depending on your take) on randomly laid out data as sequentially laid out data.
- Random Placement: I've place 100 blocks randomly. The numbers have been generated from www.random.org. Slot 1 is block 67, slot 2 is block 19 and so on.
- Random Requested Block: this is meant to simulate IO requests from Exchange; again, drawn from a different run from www.random.org.
- Matching Block (Random Placement): this is where the requested block actually lives, So asking for block 80 requires a visit to slot 22, and so on.
- Seek Distance (Random Placement): this is the effective seek distance between requested random blocks. After we visit slot 22 (for block 80), we need to visit slot 58 (for block 37), requiring a seek of 36 slots.
- Seek Distance (Sequential Placement): this is the effective seek distance between requested sequential blocks. After we visit slot 80 (for block 80), we need to visit slot 37 (for block 37), requiring a seek of 43 slots.
(I'm having difficulty uploading the spreadsheet to TypePad, so when I get it fixed you'll be able to "Click to download the whole spreadsheet". Not yet though.)
| Random Placement | Random Requested Block | Matching Block (Random Placement) | Seek Distance (Random Placement) | Seek Distance (Sequential Placement) |
| 67 | 80 | 22 | 0 | 0 |
| 19 | 37 | 58 | 36 | 43 |
| 75 | 18 | 61 | 3 | 19 |
| 23 | 26 | 53 | 8 | 8 |
| 85 | 57 | 63 | 10 | 31 |
| 59 | 100 | 14 | 49 | 43 |
| 14 | 59 | 6 | 8 | 41 |
| ... | ... | ... | ... | ... |
| SUM | 3269 | 3322 |
Hey, look at that! The sequentially laid out data takes more slot seeks than the randomly laid out data! Try it yourself, replace the 100 numbers in the first two columns with random numbers, and check the seek distance sum.
On average, they will be equal. In fact, if the requested blocks are random, it doesn't matter how the data is laid out. Intuition fail.
Here's the professional sport of benchmarking, which HP don't take part in (still being, as it were, at the rock-throwing stage);
- NetApp SPC Benchmarks (no HP here)
- SFS2008 NFS benchmarks (HP missing again)
And, of course, the official ESRP results (and, just as a reminder, these aren't benchmarks)
Having failed at shot-putting, perhaps HP might want to pick another sport for their talented testers. Like nude football.
[updated to correct some borked links and a typo].

Excellent response to the latest round of endless FUD from our jealous competitors Alex!
Let's reveal a troubling dirty secret here…
I find it endlessly amusing that EMC, HP, Dell and others claim NetApp "marketing stunts" whenever we objectively prove performance (or efficiency) advantages - yet they retort only with subjective claims instead of stepping up to the professional challenge of true objective analysis.
While ESRP is very useful for its transparency (ever see EMC, HP, Dell, HDS or others use RAID6? Or anything but RAID10?) of configurations supporting max performance Exchange solutions, Microsoft is rather explicit that ESRP should NOT be used to compare vendor results.
OTOH - SPEC & SPC are industry associations of professional performance engineers from all vendors interested in publishing objective comparative results for customers to review. While EMC is perhaps the most opportunistic of the bunch, choosing to publish only NAS results now and then, almost all vendors participate directly or indirectly (i.e. Dell as proxy for EMC @ SPC) in these industry associations. HP certainly does, and so does HDS directly and indirectly for HP.
Two of the tenets of these industry associations are that:
1. The workloads being benchmarks are agreed upon by all competitors, and
2. Any of the member performance engineers can "call bullsh*t" on any result. Specifically, any member can formally request the withdrawal of a competitor's report if they can prove a flaw. It's happened in the past on more than one occasion, so this is not an empty claim or impotent clause.
NetApp has consistently published independently audited high-performance results to both SPC & SPEC with the highest raw storage efficiency, best response times, no degradation with multiple snapshots, and highest ops/disk ratios, among many leading indicators. We even published a SPC result using FC LUN's which demonstrated no WAFL performance degradation over a month of intense & continuously high SPC-1 I/O !
With that as a backdrop, it's fascinating to me why the competitive whiners from EMC, HP, Dell, HDS and others did not use their respective performance industry member associations to simply issue a formal challenge and attempt to disprove any of our many independently-audited published results?
For example, if HP thinks our FAS systems perform as poorly as they claim - why not use their SPC affiliation to challenge any of our published reports? After all, they agreed to the workload, and they've even published EVA results years ago. Why haven't they challenged NetApp's FAS results? And why haven't they published EVA results in ages? Hmmm…
Dell is EMC's largest reseller of CLARiiON. Surely they could have challenged the NetApp SPC results proving better FAS performance than CX, especially against EMC's embarrassing CX results with snapshots enabled. Remember, these are objective independently audited results. Yet the formal challenge was never filed. Because ALL NetApp results on FAS & EMC CX have been scientifically proven true, not merely claimed via subjective disclosure!
So the dirty secret is that FUD via (anti)social media outlets such as blogs and twitter remain the last platforms jealous competitors have to (re)launch their propaganda campaigns of technical misinformation against NetApp.
All the rational platforms for attacking NetApp performance and efficiency based on logic and scientific disclosure have been removed by NetApp's independently audited objectively-proven results.
Posted by: Val Bercovici | September 27, 2009 at 02:45 PM
We all know "our" history talking about ESRP, and agreed it is NOT a benchmark. Like to point out that HP does have many submissions for SpecSFS on the SFS97_R1 benchmark, which was just retired a little over a year ago, so they do play, just are out of date. (likely due to Polyserve not having performance that should be published, and iBrix being too new to HP to get through a benchmarking activity yet...we'll see what happens, but I expect nothing on that front as well).
Kudos to NetApp for always taking the time to get benchmarking done, but as I've said in the past, benchmarking isn't about the results, it is about beating the test.
We've had our differences, and my poking fun at the marketing campaigns is always at some level out of jealousy, you guys come up with some amazing ideas that always get both good and bad publicity, but regardless a TON of attention.
My feeling about ESRP, it should go back to what it was intended for, allowing a customer to get a warmer feeling about a proof of concept for a specific application. As always any customer's mileage will vary, but it gives them an idea of how a product will work with a specific profile.
"The Exchange Solution Reviewed Program (ESRP) – Storage is a Microsoft Exchange Server program designed to facilitate third-party storage testing and solution publishing for Exchange Server."
It is always up to the vendor to publish a solution that is supportable and maintainable.
Posted by: | September 30, 2009 at 08:56 AM
@anon
I'm not sure who you are, so "our" is a little mysterious to me!
Agreed about ESRP as a POC rather than a benchmark. Well said.
As to beating the benchmark; yes, there's an element of that. But there's also something other than the raw numbers; we get so much FUD about "WAFL kills performance and degrades over time", that the benchmark's primary use for us is; "No, it doesn't, and here's the evidence."
But I think that's in the past, HP's past & current efforts to stir the pot not withstanding. Virtualization of storage brings so many advantages that it's impossible to avoid. Example; I hear thin provisioning on the EVA is just round the corner (as opposed to the feeble adaptive or "chubby" Windows-only provisioning that HP currently offer).
Now that will be interesting, if for no other reason than hearing HP explain how they don't have fragmentation problems.
Times and technologies change. That's all this local spat about "non" benchmarks is about -- some parts of HP haven't got that message yet. HP's journey is likely to be long and slow given their current attitude.
Posted by: Alex McDonald | September 30, 2009 at 09:26 AM
Sorry Alex,
Not sure how it posted Annon. Trying again with IE instead of Firefox.
Regards,
Steven Schwartz
The SAN Technologist
Posted by: Steven Schwartz | September 30, 2009 at 09:41 AM
Ah, welcome back Steven. Now I see to what you were referring :-)
Posted by: Alex McDonald | September 30, 2009 at 10:32 AM
Good information. I am currently at my end of line in making a decision to go with Equallogic or Netapp.
We are a small law firm with 80 staff. I will be consolidating eight physical servers to two esx hosts using vSphere. The units I am looking at are:
Equallogic (16 x 250Gb) RAID 6 – 2.8 Tb of storage
Netapp (12 x 500Gb) RAID DP – 2.1 Tb of storage
Both of the above SAN's include exact duplicates at the DR site with only one controller. Netapp is around 6k more than EQL.
Any comments or suggestions?
Posted by: Dennis | October 20, 2009 at 04:10 PM
@Dennis
Did you have a look at SUN & LeftHand ? both should also be competitive in this space. Failing that you need to look at the merits of both systems on offer and check out the ongoing incremental costs associated with upgrades, expansion and support. A 6K saving now could turn into a much larger loss further down the line.
Posted by: john | October 21, 2009 at 01:31 AM