I've been recently engaged in a series of internal discussions about the question of what is storage efficiency.
As a company, NetApp has continued to deliver amazing amount of value by allowing our customers to store huge amounts of data on a relatively small storage foot print and yet gets routinely criticized for our inefficiency.
In point of fact, we create a marketing program to put our money where our mouth is, hell we even run a local marketing campaign, and still we get asked: well how efficient are you really?
Which makes me wonder if we're just barking up the wrong tree.
Storage Efficiency is so last century
SAN storage architectures built around Real Fibre Channel principles are what created the current insanity that frames the storage efficiency discussion.
They declared that the storage problem could be best be solved by creating a storage architecture that first required you to combine lots of storage into a single disk pool, and then statically parceled out the resources into smaller chunks that were handed out to servers. Storage utilization was a measure of how effective you were in using your storage pool. And then the question was of the storage you had allocated how efficient you were in using that.
Great.
So the entire storage industry is married to a storage management model that says that storage efficiency is a measure of how efficiently you used antediluvian architectures based around static partitioning.
Wonderful.
But the problem, and here's the rub Better Than Real Fibre Channel can do things that Real Fibre Channel can not. Because Better Than Real Fibre Channel can use block level reference counts, Better Than Real Fiber Channel can store 255 full copies of data at the cost of that actual unique data within those 255 copies.
So if I have a snapshot, a single NetApp snapshot, I have a full copy of the file system. But instead of being applauded for storing a huge amount of data on a small amount of storage, I have to explain the WAFL overhead.
I'll be blunt, the WAFL overhead allows you to store what to a client appears to be a full copy of the file system on the minimal amount of disk blocks necessary. You wanna do that using Real Fibre Channel, I really hope you like buying disk drives.
And if it was a couple of EMC bloggers I wouldn't mind. But this discussion has permeated the entire computer industry.
And I wouldn't care too much except the question of how we should measure storage system efficiency is not even being discussed because of the fusillade of FUD. And because at the end of the day no one really wants to talk about how inefficient their architectures are.
If they did, storage customers would learn exactly why it's better to use RAID10 to protect your storage instead of replicating your data at the application level.
It's almost 2010 ...
NetApp and Data Domain, and soon others, have demonstrated that a storage system that uses indirection between the disk block and the logical disk block can store more data on less storage than we ever imagined.
And instead of talking about how efficiently we do that, in terms of CPU/Memory, or effectively in terms of the impact to applications we continue to discuss what is the most pointless of conversations:
If I treat my very sophisticated storage array like a set of big dumb disk drives I bought at the local electronics discount store, what's the usable capacity?
Blech.
