I had to chuckle when I saw the concurrent announcements from Permabit and Storage Switzerland that dedupe 2.0 had arrived. Apparently the acquisition of Data Domain by EMC had signaled the demise of dedupe 1.0, even though I failed to see any technology transformation as a result of that merger. Data Domain is still doing the same thing they always did and EMC is still doing what they do. There was no transformative event here guys.
Anyway, the people at Permabit seem to believe that hashing and deduplicating large archival stores is ushering a new era of deduplication and leaving point products behind. But, er, isn't Permabit another point product? Last time I looked in a data center there was more than archival data behind the blinking lights on those storage arrays. In my estimation, deep archival consumes somewhere around 1% of the world's data storage.
Then there's Storage Switzerland. I guess the Switzerland part is to have one believe that they are a neutral party amidst this vendor-against-vendor world. But does anyone out there really think they are neutral? I'll let my readers form their own conclusions. Anyway, SS says that the core of dedupe 2.0 will be a foundational repository where all previously optimized data will come to rest from primary and secondary storage. Hmmm, sounds strangely like an archival system to me. Wait, doesn't Permabit sell archival systems, and they are also talking about dedupe 2.0? Could there be a connection here? Say it ain't so Storage Switzerland!
OK lets move back towards reality. You know when dedupe 2.0 will really arrive? When people stop talking about dedupe altogether. Like many technologies that came before it, such as RAID, SCSI, Snapshots, and dozens of others, dedupe will become ubiquitous and acknowledged as a must-have feature from every serious storage vendor. Inline or Post-process, source or target, local or global - it doesn't matter as long as it dedupes. Dedupe will run silently in the background, eradicating those pesky duplicate data objects. NetApp has proven that dedupe can run quite well on a unified platorm that services all tiers of storage (including archival), uses all standard protocols, and serves a wide breadth of applications. Once a concept is proven, the next steps are constant refinement and optimization, and thats exactly what we are doing. I can't tell you exactly the day dedupe 2.0 will arrive, but I am pretty sure that NetApp will be among the frontrunners.
DrDedupe

Doc,
Great blog post! I have to say I agree with most of your comments. Let me add a little color that will help you understand what we call Dedupe 2.0.
First off, we agree that dedupe will soon be applied to all tiers of storage. That’s the premise of Dedupe 2.0. Storage efficiency and ubiquity. No question that is where the technology is headed and Permabit is on that train. In order to fulfill the promise of dedupe, solutions must scale (to PBs), produce exceptional dedupe rates with file awareness and have minimal impact on storage (read/write) performance.
Dedupe 1.0 was about backup. Let’s face it, Data Domain won Dedupe 1.0 and did the right thing for their shareholders by selling to EMC.
Next…
Since you used the swimming metaphor, I will too. As I learned when growing up in the Midwest, before jumping in the lake, make sure you can swim. Incidentally, I’m a triathlete. So, I swim pretty fast and because I train hard, I can swim a long way.
We think dedupe is much like a swimming race. You need to be able to go the distance. At Permabit, we began developing dedupe technology in 2000. It has always been a fundamental “feature” of our storage solution and is a vital attribute of our Value Tier storage. Our dedupe scales to PBs, produces outstanding dedupe rates and delivers fast performance. And because our value tier storage addresses 80% or more of the information of an enterprise, we are delivering on the promise of Dedupe 2.0 and producing huge savings for enterprises.
Customers are seeing the savings by dedicating primary storage to transactional information and moving the rest to our Value Tier with deduplication. That frees up valuable storage on primary disk and also has a dramatic impact on backup costs. They no longer send 80% of their data over the network every week for backup. Once in the Value Tier, savings are realized in three ways:
· Dedupe means less disk – lower investment, less power/real estate and management.
· Reduced backup hardware/software costs.
· Reduced primary storage investment, management, support and maintenance.
Bottom line, that means a ROI within months, not years. This is where Dedupe 2.0 begins to shine. And yes, this is just the beginning. The more frequent and earlier dedupe is used in the lifecycle, the more efficient the overall storage environment becomes.
So Doc, we like what you are saying. If you would like to match up dedupe results, customer cost savings or in a mile open water race, I’m game.
Tom Cook
President & CEO
Permabit Technology Corporation
Posted by: Tom Cook | September 11, 2009 at 01:47 PM
Tom thanks for your passionate response. While I appreciate Permabit's attempt to raise the dedupe bar, lets admit your basic technology has not changed since introduced in 2000. The only thing that has changed is your new definition of the "Value Tier" otherwise known as archival data. NetApp could have legitimately staked a claim that the Dedupe 2.0 era was ushered in when our Users began deduplicating primary stores en masse, but we chose not to do so because the market was and still is very nascent.
So lets stop the silliness of saying a new era of dedupe is here when in fact nothing has changed and agree that the true transformation will occur when users agree that dedupe can be applied anytime, anywhere (yes even transactional data will be dedupe'd.)
Dedupe 1.0 is not about D2D backups and Data Domain, its about proving the concept of risk-free elimination of redundant data on enterprise-class data. We vendors are getting there, but we still have alot of work to do. So lets agree to advance deduplication with true technology breakthroughs, not Marketing Campaigns.
Thanks again and best of luck in your next triathlon!
DrDedupe
Posted by: DrDedupe | September 11, 2009 at 06:43 PM
Larry,
That's not very fair -- I could say "let's admit that NetApp's basic technology hasn't changed since 1994" given that you're still using WAFL. Both of us have been actively innovating our products since they were introduced, and the facts are that we're still miles ahead of you in terms of deduplication technology. We provide global, sub-file, variable-block dedupe across hundreds of terabytes of storage. I know you're working hard to catch up -- let me know if we can help.
I'm surprised you're not thrilled with the Dedupe 2.0 message -- after all, you've already got a notable position in this space! Dedupe 1.0 was about D2D backup and VTL; deduplication where the customer is needlessly storing lots of redundant data, but not addressing their significant cost challenges in primary and archive storage. Dedupe 2.0 is bringing deduplication to the data where it's still in use, on primary and value storage tiers. With A-SIS, despite it's limitations, NetApp has one of the largest Dedupe 2.0 deployments out there.
This isn't just a marketing distinction. Dedupe for backup, dedupe 1.0, is over… there's no market for a new deduplicating VTL/D2D vendor. Enterprise customers have not widely deployed deduplication outside of backup, largely because few to no enterprise vendors have delivered that. Meanwhile, primary and value storage is growing explosively due in no small part to virtual system images, where much redundancy lies. Our customers need solutions that provide significant space and cost savings on this live data. Products that meet that need are dedupe 2.0.
Regards,
Jered Floyd
CTO & Founder
Permabit Technology Corp.
Posted by: Jered Floyd | September 18, 2009 at 01:28 PM