November 20, 2009

Questions and Answers about Deduplication

I recently sat in on a webcast Q&A panel with a few other vendors that provide deduplication technology.  Representatives from EMC, Data Domain, NEC, and Exar were on the panel, along with me.  The discussion was lively and the questions came in from the listeners rapid-fire.  Based on the large number of attendees and questions, its clear that people still have a lot of concerns about deduplication.  The webcast was sponsored by SNIA, who continues to do good work in educating people about new technologies like dedupe.  Click here for a replay of this panel discussion.

DrDedupe

November 17, 2009

Fun with Data Storage

DrDedupe_NewYork

In an ongoing attempt to keep everyone from taking themselves too seriously, I'd like to point out a couple of videos that were recently posted on NetApp TV. 

When the boss insists that you stay late and finish up that project the world can't seem to wait another minute for, when you've been tied up all day in mindless meetings that keep you from doing your real job, when you tell your loved ones about the breakthrough you made today on that pesky algorithm and they give you that all-too-familiar blank stare - give yourself a treat and enjoy a few minutes of data storage humor, compliments of NetApp and DrDedupe.

Extreme Makeover - Data Center Edition

DrDedupe on the Streets of New York

NetApp - taking the rage out of storage

DrDedupe

November 12, 2009

Deduplication and Hyper-V are bringing NetApp and Microsoft closer together

I don't want to say that dedupe was the only reason that NetApp was named Microsoft's Storage Solutions Partner of the Year, but I can dream can't I ?....

Steve Ballmer:  DrDedupe, on behalf of our bazillion Microsoft employees, I'd like to personally thank you for your contribution to our success.

DrDedupe:  Aw shucks Steve, it was nothing really.  Just a couple hundred lines of code the team wrote over a long weekend.  We were kind of bored you see and it seemed like a fun project...

OK seriously, as Hyper-V begins to gain in popularity, there is some real synergy going on between NetApp and Microsoft.  Check out this video, where can you see a demonstration of a Hyper-V LUN being created and then easily re-sized after being dedupe'd.

Using NetApp storage with Microsoft apps seems to makes alot of sense not only for Hyper-V but also for some other apps you might have heard of such as Exchange, SQL, and Sharepoint.

SQL and its nephew Sharepoint offer some interesting (and largely unexplored) use cases for dedupe that I will discuss in upcoming blogs.  Exchange, as I noted in a previous blog, has jettisoned single instance storage (SIS) in 2010, presenting yet another compelling opportunity for NetApp dedupe.  Yes - I know that Microsoft often espouses server-attached storage for Exchange, but sooner or later they'll realize that people stopped doing that around a decade ago - for many good reasons.  (Microsoft are you listening?)

Speaking of blogs, here's a good one that Nick Triantos posted about one of our newer features, SnapManager for Hyper-V.  This feature brings automation to Hyper-V for DR and D2D Backups and is integrated with Microsoft's Volume Shadow Service (VSS).  Very cool indeed, and OK maybe the real reason we won the partner award...

Anyway, if you are a Microsoft app user (sorry Sun) you owe it to yourself to check out what NetApp can do for you.  Our latest issue of Tech OnTap also features an article on the 5 best practices for Hyper-V.

DrDedupe

October 16, 2009

4 Things Frank Slootman Probably Wishes He Didn't Say

Dave Raffo interviewed Frank Slootman this week at SNW.  For those that don't know him, Frank is the former CEO at Data Domain and is now President of EMC's Data Backup Division.  During the interview Frank said things that were, well, un-presidential.  Unfortunately for Frank, we live in a world that records comments like his and preserves them longer than those unidentifiable objects tucked away in the back of your refrigerator.

Anyway, Frank made four comments during this interview that stood out as “huh? did he really say that?” types of comments.  I only wish I were in the room so I could have asked some follow up questions.  If I were invited into the room (like that would ever happen) here is the way the interview might have gone:

Raffo: But if Disk Library customers want dedupe, they have to buy a Data Domain box?

Slootman: Yes. If you have a car and now you want an airplane, are you going to put wings on your car? You have to buy an airplane.

DrDedupe:  Frank – I am not sure that is the correct analogy.  A car is designed to travel on the ground, and an airplane is designed to fly through the air.  I think what you meant to say is that if you have a car now and you want a more efficient car, EMC says you should just buy another car, right?

Raffo: EMC has sold Quantum's data deduplication software with Disk Library. Will you sell Data Domain software with Disk Library instead?

Slootman: No. Disk Library is a straight VTL [virtual tape library] , like it always should have been. It's a brute-force system, no finesse. That's the way it was when it first came out, then they tried to turn it into something it is not by adding deduplication and replication. They bastardized the product, so much so people don't even know what a VTL is anymore. People think VTL is a generic term for backup to disk. People think Data Domain is a VTL, but 90% of the systems we sell are IP-connected, not with a Fibre Channel protocol.

DrDedupe:  Thanks for the clarification Frank - EMC was selling a bastardized product and confusing the market.  Good luck in the next all-hands meeting at that division you run now.

Raffo: What about the Quantum software EMC has sold with Disk Library?

Slootman: We're swapping a lot of those boxes out at zero revenue. We've taken out about a dozen and we'll continue to take out a similar number this quarter. Customers don't want it.

DrDedupe:  Wow Frank this is a very interesting disclosure.  Would you care to point out any other products that EMC has sold that your customers don't want?

Raffo: Will you continue to work closely with Symantec Corp.'s OpenStorage (OST) API now that you're EMC?

Slootman: Yes. I'm not throwing my partners under the bus. We'll compete, but we're all competitors and partners these days. We won't screw them. We'll screw other companies, like CommVault. We {Data Domain] treated them as a good partner and they came after us.

DrDedupe:  This is another interesting disclosure Frank.  I am sure our viewers will want to know who the other companies are that you intend to screw.

So thats the way the interview might have gone with DrDedupe in the room.  I'm pretty sure that Frank will have some 'splaining to do next time he travels to Hopkinton.  But then again, isn't he starting to sound more like other EMC execs?

DrDedupe

October 15, 2009

The Cost of Efficient Storage can be ZIP

Storage efficiency is a funny thing.  Vendors these days are asking customers to spend money so that they can save money on their data storage.  Huh? This is a bit of a paradox.  Here's the example I use, lets say my wife comes home with a brand new pair of shoes.  "Hey honey, look at these great shoes I just bought!  Regular price was $200 but I got them for 50% off!"  My wife sees this transaction as saving us $100, but I see the transaction as $100 gone from the family checking account, never to be seen again.  Who has the correct perspective?  Well, that all depends on just how necessary those shoes were.  If the closet is overflowing with shoes just like the new ones, this was probably a wasted expense.  However if the shoes were bought to perform a specific function that no other shoe could accommodate (like this), then its probably a worthwhile investment.

Storage efficiency is kind of like that.  You shouldn't invest in it unless it provides some type of new functionality that you don't already own.  Plenty of vendors will offer to sell you cheaper storage, with heavy discounts to entice you, but if its just like all the other storage already in your closet, what are you buying?  More shoes just like the ones you already have?  When you invest in storage efficiency, you should look for something different, something that lets you do things you haven't been able to do before.

Why am I bringing this up?  This week NetApp rolled out a new promo called ZIP, or Zero Investment Promise.  Here's the concept:  instead of buying more shoes, er I mean storage, why not make your existing storage more efficient?  With ZIP, NetApp will loan you a shiny new V-Series Open Storage Controller, but before it hits your dock a promise is made that the V-Series will immediately save you x, (x is defined once we do an onsite evaluation, for free.)  We'll put this promise in writing and give you 90 days to run your tests.  If your tests deliver what we promised, you buy the V-Series (heck its probably already paid for itself by now anyway) - but if you believe we broke our promise, and we agree - the V-Series is yours to keep - gratis.

OK get ready, here comes the fine print.  You and your data center must be in North America (some countries strangely have a problem with promos like this), you cannot already be using V-Series (duh), you have to be a current user of EMC Clariion or HP EVA systems, and you have to agree to use 5 key features on your EMC and HP system, provided via the V-Series (Snapshot, FlexClone, Thin Provisioning, Thin Replication, and of course Deduplication).  That's about it.

So check it out.  And don't assume you have pay alot to bring efficient storage into your world.  In this case, you'll pay ZIP.

DrDedupe

September 28, 2009

NetApp Replaces OR with AND

I spent last week in New York City, where I was asked to brief many of our largest customers on the merits of deduplication and how it fits into our storage efficiency story.  I also walked the streets of Manhattan in my Dr's garb with a film crew and conducted spontaneous "Man on the Street" interviews, but more on that in another blog...

Anyway back to my briefings.  Most, but not all, of the dozen or so customers I talked to were already implementing deduplication and praised its effectiveness.  A few of the customers I briefed were not actually customers, but rather perspective customers that wanted to learn more about the technology behind NetApp storage efficiency and how it could help them improve their current storage environment.  One thing I've mentioned before but bears repeating is that NetApp deduplication is not limited to NetApp storage systems.  Our V-Series open storage controller brings deduplication and storage efficiency to HP, Sun, EMC and many other arrays.  We didn't make those storage systems, we just made them better.

This brings me to the title of today's blog and the message I always convey during my briefings.  At NetApp, we never ask you to sacrifice performance or resiliency for the sake of reduced storage capacity.  In fact, we'll improve your ability to quickly and safely serve data to your users and applications while at the same time reducing your physical storage footprint.  Thats the message behind our new ad campaign - Replace "Or" with "And" - a clever play on words but one we take to heart.  If you are interested in seeing more - here's a link to our new ad page - including a video I recorded on the topic, yes I do serious videos too.  Also, heres a link to our revamped storage efficiency page featuring yours truly again.  I wonder if the Dr runs the risk of being over-exposed?  Nah you can never see too much of DrDedupe.  Enjoy-

September 16, 2009

You Lie! Who Coined the Term "Data Deduplication"

Today I came across this quote from Ed Walsh, former CEO of Avamar Technologies:

“Data deduplication was not a term until Avamar used it,” Walsh said. “Data Domain called it capacity optimized storage. At Avamar, we had to teach the market that data deduplication was something that you wanted. Now the market is rife. The technology has proven itself.”

As your purveyor of truth in dedupe, I feel it is my responsibility to get to the bottom of this issue.  I've always given Data Domain credit for coining the term that eventually became my alter-ego. 

So, no I am not going to try and build a case for NetApp.  As stated  out in our award-winning video - "we didn't invent, it we perfected it." - But who really did invent it?  Lets find out together.

First stop, the U.S. Patent Office.  Avamar is listed as the owner of 6 patents.  The oldest, from February 2001, is entitled "Hash File System and Method For Use In Commonality Factoring System."  Whew.  Glad that name didn't last, Dr Commonality Factoring doesn't have the same ring to it.  Well ,anyway, this patent mentions duplicate data but makes no mention of data deduplication.  In fact, after searching all 6 Avamar patents, we find that the word deduplication is never used.

OK, now lets examine Data Domain.  They are listed as the holder of 11 patents.  Their oldest patent is from January 2002 and is entitled "Archival Data Storage System and Method."    Does it mention data deduplication?  Alas, no - it describes an archival storage system with disk drives that spin up and down (an idea, by the way, that seems to have spun down on its own.)

But wait!  A Data Domain patent from July 2006 is entitled "Locality-based Stream Segmentation for Data Deduplication."  The first use of the term belongs to Data Domain!  Do we have a winner?  No, lets just say the score stands at Data Domain 1, Avamar 0, at halftime.  BTW here's some nice halftime entertainment courtesy of the Dr.

OK back to our task. Next - lets try some google searches to see what comes up.  Well, well. Here's an Earnings Release from Avamar dated October 2006, four months after Data Domain's deduplication patent - and an excerpt from that press release: 

"Avamar Technologies, Inc. is a leading provider of enterprise data protection software. Avamar's patented data reduction, single instance store and point-and-click restore technologies have revolutionized backup, recovery and DR strategies for global corporations."

Nary a mention of data deduplication anywhere in this press release.  Now, Ed, if Avamar was using the term data deduplication before Data Domain, wouldn't it have been used in something important like this press release?  Sorry, but I have to declare a shutout here and name Data Domain the winner over Avamar , 2-0.  Ed I believe you've been caught in a little white lie.  Remember, the Dr is watching...

September 10, 2009

The Dedupe 2.0 Pundits Are Still Swimming in Lake 1.0

I had to chuckle when I saw the concurrent announcements from Permabit and Storage Switzerland that dedupe 2.0 had arrived.  Apparently the acquisition of Data Domain by EMC had signaled the demise of dedupe 1.0, even though I failed to see any technology transformation as a result of that merger.  Data Domain is still doing the same thing they always did and EMC is still doing what they do.  There was no transformative event here guys.

Anyway, the people at Permabit seem to believe that hashing and deduplicating large archival stores is ushering a new era of deduplication and leaving point products behind.  But, er, isn't Permabit another point product?  Last time I looked in a data center there was more than archival data behind the blinking lights on those storage arrays.  In my estimation, deep archival consumes somewhere around 1% of the world's data storage.

Then there's Storage Switzerland.  I guess the Switzerland part is to have one believe that they are a neutral party amidst this vendor-against-vendor world.  But does anyone out there really think they are neutral?  I'll let my readers form their own conclusions.  Anyway, SS says that the core of dedupe 2.0 will be a foundational repository where all previously optimized data will come to rest from primary and secondary storage.  Hmmm, sounds strangely like an archival system to me.  Wait, doesn't Permabit sell archival systems, and they are also talking about dedupe 2.0?  Could there be a connection here?  Say it ain't so Storage Switzerland!

OK lets move back towards reality.  You know when dedupe 2.0 will really arrive?  When people stop talking about dedupe altogether.  Like many technologies that came before it, such as RAID, SCSI, Snapshots, and dozens of others, dedupe will become ubiquitous and acknowledged as a must-have feature from every serious storage vendor.  Inline or Post-process, source or target, local or global - it doesn't matter as long as it dedupes.  Dedupe will run silently in the background, eradicating those pesky duplicate data objects.  NetApp has proven that dedupe can run quite well on a unified platorm that services all tiers of storage (including archival), uses all standard protocols, and serves a wide breadth of applications.  Once a concept is proven, the next steps are constant refinement and optimization, and thats exactly what we are doing.  I can't tell you exactly the day dedupe 2.0 will arrive, but I am pretty sure that NetApp will be among the frontrunners.

DrDedupe

September 08, 2009

Exchange 2010 Dismisses SIS?

Microsoft Exchange revolutionized the way we communicate.  Its hard to imagine life without email.  Sure there are other email apps out there but aren't they all ladies-in-waiting to the true Queen of email - Microsoft Exchange?

Since version 4.0 was released (about the time the B-52's were singing "Love Shack") Exchange had a cool feature called Single Instance Storage, SIS for short.  SIS was the granddaddy of deduplication, and went something like this:

Email from Joe in Accounting:  "Hey check out this cool bootleg copy of "Love Shack!"

Email from Sally in Marketing, Fred in Shipping, Adam in Sales and about 10 other people "Hey check out what Joe sent me!" 

Well what Joe sent you and you sent to 10 friends and they sent to 10 friends was basically the same email with the same attachment sent over and over again.  The folks in Redmond figured this was going to be a regular thing, so they invented SIS.  With SIS, all those emails were reduced to a single instance, as long as they were held in the same Exchange Store.

Single Instancing saved storage capacity and was free - what's not to like?  Well fast-forward about 15 years and Microsoft had a problem.  More mailboxes, more attachments, and not only bootlegged audio files but attachments of all shapes and sizes - Word Docs, Powerpoints, PDF's, JPEGs, MPEGs, where will it stop?  So Microsoft did what any self-respecting software company would do, they said "you know, storage hardware is cheap these days, and all this SIS stuff is really slowing us down - so please don't use SIS anymore and just buy more storage, OK?

That's right.  Microsoft needed to destroy the monster it had created.  As of Exchange 2007 the message from Redmond was "Given current trends, we expect the value of single instance storage to continue to decline over time. It's too early for us to say whether SIS will be around in future versions of Exchange. However, we want everyone to understand that it is being deemphasized and should not be a primary factor in today's deployment or migration plans."  Worse yet, the word on the streets is that in Exchange 2010, SIS will disappear altogether.

Now before you come down with an acute case of ESISPTD (Exchange SIS Post-Traumatic Disorder) the Dr has some good news for you.  You can use NetApp deduplication to get back what Microsoft taketh away.  On tests run with Exchange 2010, without doing much optimization, we've seen seen space reductions of around 30% via dedupe.  Better yet, since NetApp dedupe is run in the background as a post-process procedure, performance impact on those users sharing their favorite files should be nominal.  I guess you could say that NetApp dedupe lets you go Back to the Future - It's free and it saves storage capacity - whats not to like?

DrDedupe

September 03, 2009

The Evolution of the Storage Brain - Applications Run Faster With Deduplication

It seems a contradiction - improving system performance with dedupe?  Usually, when someone thinks about implementing dedupe on a storage system, the first thought is "hmmmm...how much is this going to slow down my applications?"  In this blog I am going to tell you how dedupe can actually speed up your applications.

In Data ONTAP 7.3.1, NetApp introduced "Intelligent Caching."  Intelligent Caching refers to FAS and V-Series system memory, as well as secondary memory.  By secondary memory I am referring to the Performance Acceleration Module, or "PAM" for short.  NetApp introduced PAM II in this press release, although it was buried in with a bunch of other important announcements so it was easily missed.  I'll  come back to PAM II later.

So how does system memory, secondary memory, and Intelligent Caching relate to dedupe?  Simple.  When a NetApp storage system holds a dedupe'd data block in its memory, it also holds all the information about the data pointers that reference that particular block.  If this block has been deduped, say, 100 times, and is currently held in memory, there is no need to access the block from disk again, regardless of who is requesting the data block.  The result is faster access to data via reduced disk I/O latency.

The ramifications of this become very interesting.  The higher the dedupe ratio, the more dedupe'd blocks are placed into memory and the faster the system becomes.  We've already run benchmarks of this during VDI boot storms (when everyone turns on their virtual desktops at the same time and the storage system is inundated with read requests) and the results with dedupe and Intelligent Caching show orders of magnitude improvement in read response times.  With PAM II, the total system cache can exceed 2TB, and as Foxy Cleopatra might say, "Now that's a WHOLE lot of memory."  With this larger cache, the laws of probability begin to take affect - a larger cache means more blocks can be cached, and frequently requested blocks will remain in cache longer.  Meanwhile, since dedupe'd blocks have multple pointers they have a higher probability of being cached ...you get the idea.

To me, the combination of intelligent caching and deduplication is a signal that we are entering a new era in the maturity of deduplication, where the intelligence of deduplication is combined with other intelligences (intellligenci?) of the storage system to bring breakthrough results - the evolution of the storage brain.  I'll be exploring other thoughts on this topic in upcoming blogs...

Subscribe to This Blog


© NetApp, Inc.  |  "Safe Harbor" Statement  |  Privacy Policy