Today I came across this quote from Ed Walsh, former CEO of Avamar Technologies:
“Data deduplication was not a term until Avamar used it,” Walsh said. “Data Domain called it capacity optimized storage. At Avamar, we had to teach the market that data deduplication was something that you wanted. Now the market is rife. The technology has proven itself.”
As your purveyor of truth in dedupe, I feel it is my responsibility to get to the bottom of this issue. I've always given Data Domain credit for coining the term that eventually became my alter-ego.
So, no I am not going to try and build a case for NetApp. As stated out in our award-winning video - "we didn't invent, it we perfected it." - But who really did invent it? Lets find out together.
First stop, the U.S. Patent Office. Avamar is listed as the owner of 6 patents. The oldest, from February 2001, is entitled "Hash File System and Method For Use In Commonality Factoring System." Whew. Glad that name didn't last, Dr Commonality Factoring doesn't have the same ring to it. Well ,anyway, this patent mentions duplicate data but makes no mention of data deduplication. In fact, after searching all 6 Avamar patents, we find that the word deduplication is never used.
OK, now lets examine Data Domain. They are listed as the holder of 11 patents. Their oldest patent is from January 2002 and is entitled "Archival Data Storage System and Method." Does it mention data deduplication? Alas, no - it describes an archival storage system with disk drives that spin up and down (an idea, by the way, that seems to have spun down on its own.)
But wait! A Data Domain patent from July 2006 is entitled "Locality-based Stream Segmentation for Data Deduplication." The first use of the term belongs to Data Domain! Do we have a winner? No, lets just say the score stands at Data Domain 1, Avamar 0, at halftime. BTW here's some nice halftime entertainment courtesy of the Dr.
OK back to our task. Next - lets try some google searches to see what comes up. Well, well. Here's an Earnings Release from Avamar dated October 2006, four months after Data Domain's deduplication patent - and an excerpt from that press release:
"Avamar Technologies, Inc. is a leading provider of enterprise data protection software. Avamar's patented data reduction, single instance store and point-and-click restore technologies have revolutionized backup, recovery and DR strategies for global corporations."
Nary a mention of data deduplication anywhere in this press release. Now, Ed, if Avamar was using the term data deduplication before Data Domain, wouldn't it have been used in something important like this press release? Sorry, but I have to declare a shutout here and name Data Domain the winner over Avamar , 2-0. Ed I believe you've been caught in a little white lie. Remember, the Dr is watching...

Larry,
I looked a bit, too, and can't say definitively. Avamar was using the term "de-duplication" in mid-2007, and Data Domain started using it around June 2006. But folks were talking about de-duplication as early as June of 2005 (see this link http://searchstorage.techtarget.com/news/article/0,289142,sid5_gci1098962,00.html).
So perhaps Avamar invented it, but the don't seem to be using the word until two years later at least!
Posted by: Stephen Foskett | September 16, 2009 at 07:50 PM
Larry "Joe Wilson",
Surely you can look places other than patents! You know how those things are written -- as obscurely as possible.
I don't know who invented the term, but the first instance I find in my email is from May 24, 2004, in our response to an RFP from a customer. So it definitely dates back to mid-2004 for us. I'm not claiming credit for the term, though, our preferred language was "data coalescence" until dedupe emerged as the standard.
--Jered
Posted by: Jered Floyd | September 18, 2009 at 01:37 PM
Guys, the reality - Vendors never create anything that is used in the public domain. You get things like 'commonality factoring', 'sticky byte' factoring. The industry terms end up coming from technology analysts that are trying to describe the technology to press and end users so I am sure the term - deduplication came from someone out of Duplessies Group at ESG or Taneja Group really.
Posted by: Steve | September 30, 2009 at 07:50 AM