According to a figure posted in September by Robin Harris over at StorageMojo, NetApp has 13,000 systems running deduplication. That's an impressive number especially for a feature that was officially announced in May 2007. In fact, the NetApp deduplication license is the most commonly generated/downloaded license in the company's history. The fact that it is free, probably has played a role, but more importantly, it also points to the fact that customers are looking to deduplication to address increased storage costs by adopting storage efficiency technologies.
The latest, up to date numbers, I just received are depicted below.
Many of you that pay attention to this blog know that in September 2008, we announced a 50% Guarantee program leveraging some of these storage efficiency technologies. I'm not going to go over the details of the program nor am I going to debate the points the competition has raised. This has already been done here, here and here .
Instead what I'd like to talk about is how did the Guarantee come about plus I'd like to provide some data.
Similarly to other other storage vendors, we collect array information by an array "call home" capability. We call this AutoSupport (ASUP). Using AutoSupport we can examine various things about the array, such as its status, configuration, licenses, and general performance information.
A while back, engineering noticed that our install base was achieving very high space savings because of deduplication. In fact, the numbers were as high as 80-90%. Needless to say, this knowledge got circulated around the company and as a result of that Marketing, with the support of our Legal department, thought it would be beneficial to our customers to get the message out and put together some a Guarantee program. So the 50% Guarantee Program was born. By all accounts, this is a conservative number but it is a number that makes Legal happy although we know that the space savings achieved are much more than that.
In fact, we have the customers to prove it. Customers such as Burts Bees. You probably have no idea who Burts Bees is, but I bet your wife probably does. Mine did. There's a great article by Ted Hein on Burts Bees initiative for Green Storage titled "Burts Bees: What IT Means When You Go Green". I highly recommend reading the entire article. There are some very interesting comments in the last section titled "Doing the Math". Like this one:
"Combined with NetApp flexible volumes, data deduplication has allowed us to roughly triple the effective capacity of the SAN. Beyond VMware, by moving our physical servers’ storage onto the NetApp platform, we’ve been able to reclaim roughly 300GB to 400GB for every terabyte of storage found on the original server."
The hardware and virtualization savings alone paid for our green IT initiative. The rest came with the solution: energy savings, improved fault tolerance, recoverability and performance. "
Get outta here!!!! "Roughly Tripple"? I don't believe it!!! Chuck said it's all BS, it's trickery, and so do the HP guys who gave us the "Real 50% Guarantee Story".
How about this article on the Duke Institute for Genome Sciences:
"NetApp deduplication also played a key role in harnessing data growth at IGSP. NetApp has saved us a lot on storage for VMware. When we originally set up VMware, I allocated about 2.4TB for it. With NetApp deduplication, I’ve been able to shorten that down to less than 700GB. We now see an average of 83% reduction in redundant data on our VMware systems."
After I read the above articles, I then started searching around the net for users posting their experiences with NetApp deduplication and i run across StorageMojo's entry on deduplicating primary storage. There are some interesting comments there. In fact, one of the users has posted output from this array:
Then as I moved on, another user posted the following:
People get hang up on the 50% number and forget that this number is, by most accounts, a conservative estimate. Potential users need to keep in mind the larger picture which is, thousands of NetApp customers are running deduplication today achieving substantial space savings.
Some of our competitors have criticized us because of this program. In fact, 2 months later some of them keep posting blogs about it (HP). That's a good indication of a “deduplication heartburn”... Like they say, where there's smoke there's fire. Others, we're told, have issued memorandums to their sales force utilizing clever U-Turn techniques in order to shift the conversation into other areas where they can compete.
We've opted to use customer references instead, rather than waste our time responding, simply because people appreciate other users' experiences more than they appreciate a vendor's hearsay . We let our customers do the talking for us. It's that simple...like shooting fish in a barrel.





Hi Nick
Given your large installed base, I have no doubt whatsoever that you've found cases where your dedupe scheme works as advertised.
Genomic data, for example, strikes me as highly redundant. Lots of VMs with redundant binaries, sure -- I'll grant you that. Can't speak to the other use cases, though.
I think you're missing the key point here, so I'll try one more time.
Everyone has an issue with the cheap marketing stunt of a "guarantee" that was obviously constructed by a lawyer.
One where it's assumed you'll be switching from RAID 10 to RAID 6, that you won't be using email, you won't be using databases, you won't be using any data that's already compressed like PowerPoint and PDFs, you will need to use expensive professional services, etc. etc.
Not much of a guarantee, right?
I think all of us are happy to acknowledge and debate the various pros and cons of NetApp's dedupe. No problem there.
What we're not pleased with is the sleazy marketing campaign. If EMC were to do something similarly tacky, I'd probably quit in protest.
-- Chuck
Posted by: Chuck Hollis | December 12, 2008 at 11:57 AM
@Chuck
Genome data dedupes very badly. Unless identical DNA sequences are being stored, which is something most try to avoid for obvious reasons -- these files can be monstrous in size. The normal approach is to compress with specialized routines that give 20:1 or higher, as the usual compression techniques only give about 3:1 ratios or thereabouts.
Best use cases are backup data, VMware, geoseismic, etc; page 16 of this http://media.netapp.com/documents/tr-3505.pdf gives the kind of %ages that might be expected. Deduplication has pretty broad application, and not specific to just one or two specialized data types. Duke Institute for Genome Sciences are using dedupe for VMware, as Nick noted.
I note your displeasure with some amusement. It is, after all, an optional guarantee. If buyers of storage don't wish to avail themselves of it, they don't need to purchase it. I certainly don't see why NetApp giving customers guarantees should cause you so much publically expressed heartburn. Certainly not on the grounds you quote; as Gartner note here (http://mediaproducts.gartner.com/gc/webletter/netapp/issue19/gartner2.html) all we're ensuring is best practice (something I know you are keen on).
And to your objection that the agreement appears to have been constructed by a lawyer, I can only say; clara pacta, boni amici. Clear agreements, good friends.
Posted by: Alex McDonald | December 12, 2008 at 03:05 PM
Chuck probably has a proof showing that it can't possibly be true:
http://blogs.netapp.com/simple_steve/2008/12/reasoning-with.html
Posted by: Steve Klinkner | December 12, 2008 at 05:18 PM
If sleazy and tacky behavior are the thresholds at which anyone resigns from employment at EMC then the parking lots at 176 South St. should have been empty years ago.
-Tim-
Posted by: Tim H | December 31, 2008 at 12:18 PM
Hi Nick,
I have 2 3040c clusters...running VMWare over NFS and we are seeing a savings of 77% from ASIS or DeDup. I also have another 2TB volume that we use for student home directories (over 13,000 home folders for students) where we are seeing a 17% savings with dedup.
Posted by: Matt Brown | January 14, 2009 at 10:02 PM