Microsoft Exchange revolutionized the way we communicate. Its hard to imagine life without email. Sure there are other email apps out there but aren't they all ladies-in-waiting to the true Queen of email - Microsoft Exchange?
Since version 4.0 was released (about the time the B-52's were singing "Love Shack") Exchange had a cool feature called Single Instance Storage, SIS for short. SIS was the granddaddy of deduplication, and went something like this:
Email from Joe in Accounting: "Hey check out this cool bootleg copy of "Love Shack!"
Email from Sally in Marketing, Fred in Shipping, Adam in Sales and about 10 other people "Hey check out what Joe sent me!"
Well what Joe sent you and you sent to 10 friends and they sent to 10 friends was basically the same email with the same attachment sent over and over again. The folks in Redmond figured this was going to be a regular thing, so they invented SIS. With SIS, all those emails were reduced to a single instance, as long as they were held in the same Exchange Store.
Single Instancing saved storage capacity and was free - what's not to like? Well fast-forward about 15 years and Microsoft had a problem. More mailboxes, more attachments, and not only bootlegged audio files but attachments of all shapes and sizes - Word Docs, Powerpoints, PDF's, JPEGs, MPEGs, where will it stop? So Microsoft did what any self-respecting software company would do, they said "you know, storage hardware is cheap these days, and all this SIS stuff is really slowing us down - so please don't use SIS anymore and just buy more storage, OK?
That's right. Microsoft needed to destroy the monster it had created. As of Exchange 2007 the message from Redmond was "Given current trends, we expect the value of single instance storage to continue to decline over time. It's too early for us to say whether SIS will be around in future versions of Exchange. However, we want everyone to understand that it is being deemphasized and should not be a primary factor in today's deployment or migration plans." Worse yet, the word on the streets is that in Exchange 2010, SIS will disappear altogether.
Now before you come down with an acute case of ESISPTD (Exchange SIS Post-Traumatic Disorder) the Dr has some good news for you. You can use NetApp deduplication to get back what Microsoft taketh away. On tests run with Exchange 2010, without doing much optimization, we've seen seen space reductions of around 30% via dedupe. Better yet, since NetApp dedupe is run in the background as a post-process procedure, performance impact on those users sharing their favorite files should be nominal. I guess you could say that NetApp dedupe lets you go Back to the Future - It's free and it saves storage capacity - whats not to like?
DrDedupe

Hmmm
The B-5's sang "Love Shack" at MEC (what used to be the Microsoft Exchange Conference) 1999 in Atlanta... They were the entertainment for event night. Is there a deeper trend here?
John
Posted by: John F. | September 08, 2009 at 02:46 PM
I think the Exchange version of Single Instance Storage means something very different than what you think it means...
It was a cool feature to reduce I/O peaks (write once, instead of 10x), but the "less space" thing was often temporary and usually went away over time anyways as people marked things as read, replied, etc...
"Real world" Exchange SIS space savings is something in the sub 10% neighborhood...
Posted by: Chad | September 08, 2009 at 04:42 PM
SIS or single instance of storage means just that: an object was stored once and referenced many times within an Exchange store. See: http://support.microsoft.com/kb/175481/
What it means is that, in Exchange 2007 and prior, tables like the message table or attachments tables were global. You used secondary indexes to create things like folder views. Multiple secondary indexes could point to the same entries in a table. If I sent a message to you and you are on the same store, then there's only one copy in the message table and we both have secondary indexes or mailbox views that point to it; mine in my sent items folder and you in your inbox.
Creating those secondary indexes was IO intensive. Flattening the schema, by creating mailbox level tables instead of global or store level tables, reduced the IO dramatically. A consequence of flattening the schema is that SIS is gone.
John Fullbright
Posted by: John F. | September 08, 2009 at 05:23 PM
Your example isn't the best either:
If I sent an mp3 to 10 people in the same store then it'd be stored once and you'd have a saving through SIS.
However, as in your example, those 10 people each send the same email to another person I'm afraid SIS wouldn't help you; That mail is a new mail entirely (though largely the same as the previous one) and would consume more storage.
However, ASIS could potentially spot the duplicated blocks which the mp3 is comprised of, and making space savings.
Posted by: Darren | September 09, 2009 at 08:31 AM
My point is that basically, Microsoft Exchange SIS was never really a feature worth counting on to begin with, so it's disappearance in the name of I/O is and should be viewed as a very good thing...
Besides, it's really nowhere near as exciting as true block level dedupe (A-SIS), or even just the concept of thin provisioning, as with Exchange on DAS (or whatever you're comparing to) you'd still have all that blank space for growth pre-allocated...
Posted by: Chad | September 09, 2009 at 09:49 AM
Microsoft wants people to go to direct attached storage anyway. Cost of DAS is much cheaper then centralized storage. Not sure what econcomic value if any centralized storage offers over DAS. Justification is much tougher.
Posted by: Shan | September 28, 2009 at 11:35 AM
@Shan,
The FAS 2000 series starts at under $8000. How much is an HP MSA or Dell PowerVault with similar space and performance?
John
Posted by: John F. | September 29, 2009 at 06:05 AM