If you've come here from Isilon's "guarantee" website, please be aware that they've got the figures wrong for NetApp's space efficiency.
Very wrong. The whole series on NetApp's space efficiencies is here, and it clearly shows that NetApp's space efficiency is much higher than Isilon's miscalculated claim.
If you really want to save money and be storage efficient, you need a storage system that gives you more space than you bought, by using advanced data deduplication, double parity RAID, and a number of other space-saving technologies. Here, try a real guarantee. Accept no substitutes!
Thanks for visiting, and enjoy the blog.
Part 1 Three Men Make a Tiger
Part 2 Space is Mind Bogglingly Big
Scientists have puzzled for years over the question; why does space appear to only have 3 dimensions? Some speculate that space has more, with tiny curled up dimensions that can't be seen; up to eleven at last count.
Back here in IT land, we don't even have three. Poor old disks and tapes have to do with one dimension; lines of bits laid out on a rusty glass platter or a strip of mylar.
But users of this one dimensional space have come up with a solution; map another dimension over the top of the single stream of bits. Structures like filenames in directories and volumes and LUNs give us a second dimension to our data.
The third dimension? (OK, this is all for effect, but bear with me.) Differentiate the second dimension by adding a third dimension; qualify and assign a type to the second dimension, and identify that file as a spreadsheet or a document or a database.
Here at NetApp, we've discovered the fourth dimension.
Non Duplication
There's a lot of buzz in the industry about deduplication, but what many fail to understand is that it's possible to not duplicate data in the first place. It's what a colleague of mine, Mike Riley, called our non-duplication technologies. They add a fourth dimension to the data.
An example. NetApp snapshots are famous for two reasons.
- They're fast. Not only are the fast to take, they don't get in the way of I/O to the base data they snapshot. They're snapshots which don't sap performance.
- They're capacity-free when taken. Only changed blocks to the underlying data require additional space.
So How Much Snapshot Space?
Here's the latest set of figures from the NetApp AutoSupport data discussed in my previous blog entries. (Sorry, it's a picture. Inserting tables is next to impossible.)
UPDATE: corrected, 83TB is now 83PB
The used snapshot space is less than 3% of the total usable space. Which demonstrates how conservative NetApp are with recommendations of how much space should be reserved for snapshots; these systems reserved 13.4PB, much more than the committed space of 2.4PB. For a fuller description, Val Bercovici details the flexibility of NetApp's snapshot reservations.
How much data does that represent? Multiplying the snapshots by the size of the volumes that have snapshots, it's a whopping 1,511PB of non duplicated data.
Effective Load Factor
How should we describe this non-duplicated data that has the appearance of being a much bigger set of data? I like to think of this in terms of load factor. Worst case is to compare with the raw disk space. The data we are storing is well in excess of 1,500PB on systems with an total capacity of 217PB.
That gives an effective load factor of over 7 to 1 for these systems. Considering just the snapshot data, it's a load factor of 700 to 1.
Bottom line
It really doesn't make sense measuring usable storage with a micrometer. With 4th dimension non-duplicating technologies like our SnapVault, SnapMirror, Thin Provisioning, and Snapshots / FlexClones, the effective size of your data cube has just -- dramatically -- increased. And all without hitting performance.
It takes a few %age points of the total disk space to enable this truly smart technology.
Worth it? You bet. NetApp customers would find it really hard to go back to the dark ages of storage. Snapshots are so natural for NetApp users that many don't think about this extra dimension, and those unfamiliar with virtualized storage often fail to grasp the difference between NetApp systems and a traditional SAN.
What's Not Shown; Deduplication
All the savings above are effected by non duplication technologies. Not shown are the further savings to be made with deduplication. That increases the effective load factor further; by how much I can't tell, as deduplication depends on the data, and I don't have deduplication statistics in the set I'm working with.
But it's substantial; with VMware for example, deduplication can show 80 to 95% space savings.
Dr Dedupe
If deduplication on primary storage is of interest, go visit my colleague Dr Dedupe's blog for more insight into this technology. Interestingly, he was rated in the top 10 of most valuable vendor storage blogs over at Storage Monkeys.
I (of course!) was at #11, and not in the list. Next year's Oscars, perhaps.
Sunny at ShadeOfBlue Towers
Apologies for the lateness of the this part of this series. The weather was so good this weekend, I decided that a few days in the sun at the weekend took top priority. We see the sun so infrequently here.

This is a great write up...
But, did you really mean "just" 83TB instead of 83PB used space? (I already hear the wolves cry...)
And, I think "non duplication" is a great term to express what we can do in this area, so why doesn't it show up on the NetApp website, just like deduplication (http://www.netapp.com/us/products/platform-os/dedupe.html)? Has it been trademarked yet......?
PS - the link http://www.netapp.com/us/library/15618472.html comes back empty handed...
Posted by: Geert | September 24, 2008 at 11:47 AM
Aaaarrrrgghh!
Yes, 83PB not 83TB! Good spot; I'll correct and repost. Link too; worked for me when I checked it out, but I'll find another that works if you're having problems.
Non-duplication is the first technology to use, then deduplication. Why? It's free to do if you do it right. Dedupe always takes horsepower.
Posted by: Alex McDonald | September 25, 2008 at 03:13 AM
I was hoping that you could help me understand something. I got to this post from a link at the Register on an article on the Isilon 80% guarantee. I looked at the table above and did a bit of math. It says that systems have 144 TB of usable space and 83TB is use. I assume the 83TB represents the amount of actual data. If I divide the 83TB of used space by the 144TB of usable space, I get 57%. Does that mean the actual Net App utilization ratio for these systems is only 57%?
Posted by: Cole Sandau | May 21, 2009 at 10:36 AM
Cole; thanks for asking.
57% is the ratio of used to usable disk space. Usable to total disk space (which is what Isilon are claiming) is covered in Part 2 Space is Mind Bogglingly Big, and is 66%.
I also document the source of the data there. Utilisation rates vary by application. Database SANs tend to have low utilisation because the need a large number of spindles for performance, and disk sizes are getting larger. They can be very low; 10% or so.
File based NAS systems are far higher. Some systems in the data were near 100% of the usable.
That's the problem with Isilon using this blog as "evidence" of utilisation; it's apples to oranges, since they don't do SAN. As I noted about the data I used;
Isilon is NAS only, doesn't deduplicate data, and I'd love to see what rates they get from real systems in the field.
I'll bet Isilon's actual used space to total very low indeed.
Posted by: Alex McDonald | May 22, 2009 at 01:35 AM
Alex - thanks for the explanation. I think that I am understanding what the charts are representing. Could you tell me if this is a fair way to interpret the information you provided?
Across the 7,597 systems, your customers purchased 217 PB of raw storage. Of that 217 PB of raw storage 73 PB went to system related overhead. The overhead is for things like spares, RAID, file system OS, aggregate reserves and other items. This results in approximately 66% of the raw storage being usable – leaving 144 PB of usable disk space to actually place data on.
Is this 66% what most people would call “utilization rate”? I believe this is consistent with utilization rates that I have heard from others regarding Net App overhead and that other vendor’s systems tend to have lower utilizations rates – their systems are not as efficient as Net App. Is that true as well?
Then if I look at your other chart, I see that the 144PB of usable storage has 127,584 volumes on it and those systems hold about 83 PB of data. That represents the 57% ratio of used to usable disk and about 43% or 61PB of the usable space is un-used. I think this is consistent with how you explained it early. I believe the point you are also making is that other vendor systems would have lower ratios of used to usable b/c their systems allow for less efficient storage management when it comes to things like snapshots. Is this correct?
So, if I take the last step in this and divided the total raw storage (217 PB) by the total used storage (83 PB) I get 38%. Would this be a fair representation of the “gross utilization” of all these systems?
I have tried to study these ratios over the years and have always found it difficult to come to a common set of terms and a set of data that illustrates these concepts. I think that it is great that you are so willing to share this data. I just want to make sure that this a fair way to define and interpret your data?
Posted by: Cole Sandau | June 04, 2009 at 06:18 AM
That's a pretty good summary.
The Isilon guarantee, by the way, is not the same as the NetApp one. Isilon is guaranteeing 80% utilization of whatever they sell you. NetApp is guaranteeing that you will use 50% less storage than you use today.
That's a different animal altogether.
Posted by: Alex McDonald | June 12, 2009 at 06:36 AM