In a comment to my last blog “NetApp Dedupe Revisited” - Glenn asked “does NetApp really need Data Domain more than EMC for purely technical reasons and to fix the shortcomings of A-SiS?”
This is a question I’ve heard a few times since NetApp announcedits intention to acquire Data Domain a few weeks ago. It’s an important question Glenn, one that warrants a blog of its own. Before I answer your question though, I want to stress that the transaction between DDUP and NTAP is still in process and subject to scrutiny by regulators and shareholders.
Now, back to the question – is Data Domain’s technology superior to NetApp’s? No, it’s just different. Think of it this way – is Fibre Channel technology superior to iSCSI? Well, depending on your needs, you may choose to use either FC or iSCSI, and either technology might be best suited to you. Your decision would be based on several factors that are important to your particular needs – things like performance, experience, flexibility, and cost. There is no “right” answer here – in fact you might decide the best solution is to use FC for some apps and iSCSI for others. It’s the same with Data Domain and NetApp, let me explain.
NetApp deduplication is based on post-processing architecture. This means that data is not deduplicated immediately upon arrival at the NetApp system, but rather deduplication is scheduled to run some point on the future, presumably when all your User’s have gone home to have dinner with their families and the storage array is just sitting around looking for some extra work to do - that’s the nice thing about machines, they don’t need to eat or sleep or do those other things we humans waste time doing. Turns out that this post-processing method is perfect for production storage applications - let the applications do their thing, don’t slow them down, and then reduce physical storage when they aren’t watching.
Data Domain deduplication, on the other hand, is based on inline architecture. This means that each time data arrives at the Data Domain system, a realtime decision is made. Should I store or reference this data? The only way to make this decision is to compare this new data to all the other data previously stored on the system. Turns out this is a good way to handle D2D backup data. The data doesn’t need to be stored first before deduplication, and if the comparison process takes a little time to complete – well, its still faster than waiting for a tape to (hopefully) load and get itself into position to accept data.
So you see – these two architectures really solve two different needs. Sure, technically you can use a NetApp FAS system to dedupe your D2D backups or you could use a Data Domain system to dedupe your production data, but if you did this you wouldn’t be leveraging the strength of either product. Together though, leveraging the design aspects of each system, you could achieve optimal efficiency across your entire storage enterprise, and leverage the attributes of each system – all while sitting down for you own family dinner…

I think Kostadis made a good effort on explaining why inline and post-process dedupe both have very different use cases:
http://blogs.netapp.com/extensible_netapp/2008/09/a-little-digres.html
Short summary (like you said as well):
- inline: backup and archiving storage
- post-process: primary (latency sensitive) storage
Posted by: Geert | June 17, 2009 at 03:34 PM
It is perfect match NetApp+DataDomain=Perfect Data de-duplication
Posted by: sudhindra | June 24, 2009 at 09:23 AM