Part 1 Three Men Make a Tiger
Part 2 Space is Mind Bogglingly Big
Part 3 Think Tesseract; More Space Than A Cube
Part 4 Micrometer, Crayon, Ax
Most Sunday mornings, Carol and I take a walk down to a local coffee shop, buy a newspaper and spend a little time catching up on the week’s news. This Sunday’s big headline (apart from the collapse of Western civilization as we know it) was Zhai Zhigang’s space walk and the successful return to Earth of the three Chinese taikonauts this weekend.
In the sidebar of the article I spot some other space walk facts. The longest space walk was by US astronaut Susan Helms, at 8 hours and 56 minutes, which beats the Chinese effort (impressive though it is) into a cocked hat. Zhai was out of the capsule for only 15 minutes..
“Amazing” says I to Carol, having caught her attention. “How long do you think the longest space walk was?”.
“Let me guess”, she says. “About a mile?”
A Summary Picture
From our total of 217PB across 7597 NetApp systems, here’s how the data divides out. Using that ax I spoke of earlier this week, the decimal point is pretty spurious. But you get the idea. This, by the way, needs to be read along with the other posts. No contextomy here, please.
Some last notes on this data.
- Thin Provisioning: I don’t know whether the volumes defined in this sample or not were thin provisioned. The effect is to increase the appearance of available space from the users’ perspective. It doesn’t change the data shown above; it’s a physical view, not a virtual view of the storage.
- Deduplication: Some of this data is likely to be deduplicated. We have a lot of systems (over 10,000) with deduplication, and I’m sure a goodly percentage of the volumes on these systems are deduplicated. It doesn’t change the data shown above; it’s a physical view, not a virtual view of the storage.
The Wrong Measure
The exercise over the last few posts has been to disprove the claim that NetApp systems have far less usable space that an EMC CLARiiON.
Have I succeeded? Actually, I don’t think it matters whether I have or not, because (like EMC) I’ve largely focused on the wrong measure. I’ve measured usable (physical) space, not usable (virtual) space.
Virtual Magnification
When we deduplicate data, the amount of deduplication depends on the data source. One of the most redundant data sources in terms of repeated data are VMware images; every copy contains a complete operating system and other static data that changes little, if at all.
NetApp figures show deduplication ratios of 80% to 95% for VMware images. So how much VMware data can we store in 1TB of usable disk space?
1TB usable becomes 5TB to 20TB usable
Yes, it comes with caveats. YMMV, and all that. But the effect is real and measureable, and no other major storage vendor else can do this on primary storage.
Not EMC, not Dell/EqualLogic, not HP, not 3PAR.
Only NetApp can make better use of physical disk by virtualising storage, and as data growth accelerates, only NetApp can provide the kind of value for money that is needed.
And there are other benefits beyond capacity savings.
- Better performance; cache hits replace disk reads and writes, decreasing latency
- Greener storage; less spinning disk equates to savings in power, cooling and footprint
- Easier management; less is easier to manage than more
Why wouldn’t you want a solution that was more efficient and more effective than a dumb SAN? I can’t think of a single reason not to want to save money and do more with less in the present economic climate.
What Does “Long” Mean? Or “Usable”?
Carol’s confusion on my question about the measure of the “longest” space walk was understandable; what’s a walk, if it’s not measured in distance?
There’s the same confusion about disk space. When it’s this heavily virtualized, how are we to measure it? The old way of counting disks and multiplying by their size, or measuring “overheads” and reducing down to the amount of data these disks can hold, is fundamentally pointless.
A space walk isn’t measured in miles. and physical measures of disk space have become largely irrelevant.
The storage industry need new measures for virtual storage – and it needs to be able to stand behind what it claims with facts and guarantees, not speculation and vague promises of value for money.
.

Thanks so much for following up to the previous "good first step" with more real numbers. On first look, I'm not at all surprised by most of your pie, and would probably have guesstimated about right. But the 5% snap reserve and 1% snap surprises me. I had more like 10-15% snap reserve on my NetApp filers and used all of it keeping old snaps around. Admittedly this was a lot longer ago than I care to suggest, but this percentage surprises me.
Perhaps many NetApp customers are not using snaps at all? This could account for the low numbers there, but would surprise the heck out of me since NetApp snaps would be my number one feature if I was (still) a customer!
Posted by: Stephen Foskett | September 29, 2008 at 07:32 AM
I've not analysed it in detail, but there are a number of volumes in the sample that don't have snapshots set, plus a number of non-FlexVol (tradvol) systems. Some of these systems are of a vintage. I'll possibly take a look this week and get back to you.
Also bear in mind that the 13.5PB is being represented as a %age of the total raw, not the usable. It will be 1/3rd bigger -- approx 10% -- when compared against usable. The pie is meant to demonstrate where everything goes as a %age of the raw, hence the smaller %age figure.
Thanks for commenting!
Posted by: Alex McDonald | September 29, 2008 at 07:51 AM