« December 2005 | Main | February 2006 »

January 2006

January 25, 2006

Thin Provisioning: Helping Storage Administrators Write Bad Checks


If you are a storage administrator in a large company, here is what your life is probably like:
    All day long people come to you with urgent requests for storage. "I need one TB LUN for a critical new application, and I need it now. In fact, this app is going to be such a success, that you better give me two TB, because it'll be a pain to expand later on."

    But a year later, if you go back and investigate that 2 TB LUN, nine times out of ten you discover that the application only used half a terabyte.

This scenario, repeated over and over, leads to low storage utilization. From the storage administrator's perspective, he may have handed out all of the storage on a given system, which is 100% utilization, but at the application level, there may only be 25% utilization.

I like analogies:

    A Sun employee, an EMC employee and a NetApp employee are all at the funeral of a mutual friend. The Sun employee says, "We have a tradition at Sun. If someone dies, we put ten dollars into their coffin," and he drops in a ten dollar bill.

    The EMC employee says, "At EMC our tradition is to put $100 in their coffin," and he pulls a hundred dollar bill out of his wallet.

    The NetApp employee says, "A friend like this deserves $1000." He takes out his checkbook, writes a check for $1,110, puts it into the coffin, and grabs the ten and the hundred.

The point is, if you are going to write a check that probably won't be cashed, no need to add money to your bank account. If you are a storage administrator provisioning LUNs that probably won't be fully used, no need to add disks to your storage system.

If you think of the total capacity in your storage system as your bank account, and each thinly provisioned LUN as a check written against that account, then you should have some immediate intuition about how to manage thin provisioning. Thin provisioning works best if you give lots of small LUNs to lots of different people. That way you've got statistics on your side. You need enough spare capacity to handle normal utilization, and there's always a chance that a handful of users could suddenly use their whole LUN, so you'd better keep enough excess capacity to handle that. Keep an eye on your spare capacity to make sure it doesn't get too low. Having good sense of usage patterns is important. For a given class of user, how much additional storage are they likely to use over a week or a month? Using thin provisioning for brand new applications could be dangerous, but thin provisioning is perfect if experience shows that a particular group of users or a particular application always requests more storage than it really uses.

Thin provisioning isn't the right solution for every situation, but where appropriate it can really improve utilization. NetApp isn't the only storage vendor to support thin provisioning. This article talks about thin provisioning from NetApp, 3PAR and DataCore. Here are links to more details for each vendor: DataCore, 3PAR, NetApp.

January 19, 2006

Thoughts on Disk Drives

I think a lot about disk technology, but disk drives are so commoditized that I don't worry much about specific disk drive companies. My only thought on Seagate acquiring Maxtor is that it could reduce competition in the industry and make it harder for us to negotiate prices.

Strategically, what I do care about is that most disk companies have two different business models that they use for different disks:

  1. Commodity: Low-margin, inexpensive drives for PCs and laptops.
  2. Value-Add: Higher-margin, more expensive drives for servers and storage systems.

This two-tier structure has lasted for years. Fifteen or twenty years ago, SCSI was the commodity drive, and DASD and IPI were the value-add drives for mainframes and UNIX servers, respectively. Today ATA and SATA are commodity, and Fibre Channel is value-add.

ATA drives have about twice the capacity at half the price of Fibre Channel drives, but if you look at the raw components, the cost isn't that different. What is different is that disk companies have very different business models for value-add drives versus commodity drives. The Fibre Channel drives have higher margins, and they are more profitable. There's nothing wrong with this. These drives have better performance and they tend to be more reliable. Another way to put it is that server drives have more value, and PC drives are more commoditized. Prices and margins are always lower for commoditized products.

Since value-add drives are so profitable, drive vendors hate it when storage system vendors figure out how to use commodity drives in high-end storage systems. EMC was successful in the early 1990s because they used RAID and mirroring to make commodity SCSI drives good enough for enterprise storage. IBM stuck with expensive DASD drives, and EMC wiped them out in the market.

Several years ago, when NetApp first started designing high-capacity, low-cost storage based on ATA, one of our disk vendors fooled us into converting the project to a custom SCSI drive they would make for us. SCSI is much closer to Fibre Channel, and they told us they could build a commodity-priced SCSI drive with the same price and capacity as an ATA drive. But when it came time to purchase the drives, they told us that they had changed their mind, and instead the drives would be value-add priced.

That detour probably delayed our first ATA-based system by a year. The lesson? A "customized commodity" is a contradiction in terms. If a disk drive company builds a special drive for you, or builds a special drive for a small niche market, then they will sell it at "value add" prices. In the long run, the only way to get commodity prices is to use the same drives that volume PC vendors use in large quantities.

As a result, I've never been interested in exciting new disk technology for our ATA-based systems. Some people got excited about Serial ATA (or SATA) and thought it would be an advantage to be early to market. SATA does have advantages over ATA, but if we were early in putting SATA in storage systems, I feared the disk vendors would switch to value-add pricing. Instead we designed shelves flexible enough to let us switch from ATA to SATA based on volumes in the PC market.

2006 is going to be the year of "ATA in the Data Center".

The Coolest Storage Vendor

This will come as a surprise to people who know me now, but I was not the coolest kid in my high school. (Not even second coolest.)

So imagine my delight to learn from Tony Asaro's blog that "NetApp is probably the coolest of the leading storage vendors."

January 10, 2006

Fess Up and Clean Up (was "Data Retention Policy")

I have a confession. I once violated SEC insider trading rules. But before I get into that, I'd like to share Michael Berman's response to my posting on Data Retention Policy:
    Hello Dave,

    I'm a security consultant with 16 years of computer forensic experience.

    Email is a no-brainer. Any computer conversation with two or more parties can be reconstructed. I've worked several cases where one or more parties went out of their way to delete and erase data - we still recovered it.

    I've worked a few cases where the subject involved worked in IT or InfoSec and took steps to erase their data. In each case we still recovered enough information to make a conclusive decision (one guy is still in prison). I worked another case where a subject used milspec technology to wipe their drive. In this case my client was motivated to use short wavelength laser interferometry to attempt to recover data. We did and it worked.

    Based on the hundreds of cases I have been involved with I would state the following:

    It is very difficult for a single individual working alone to hide all evidence of their activities. If the activity involves two or more people it is almost impossible to eliminate all the evidence. In all of my work having ready access to all the data has always worked to the benefit of my client. 90% of the time it has exonerated them and 10% of the time it has allowed them to quickly and quietly settle the issue.

    Michael Berman

What struck me most was Michael's comment about "quick and quiet settlement", because it matched a lesson from my personal experience. Several years ago I violated insider trading rules. I had a money manager who somehow forgot that I was a NetApp insider. (Doh!) I found out during an annual review that he had purchased NetApp stock on my behalf during a quiet period and it had made a profit. I hadn't personally done anything wrong, but I still had a legal problem.

Let's contrast my experience with Martha Stewart's. As soon as I discovered the problem, I went to the SEC and said, "Here's what happened - how do we clean it up?" I had to sell the stock and pay back the profit, but notice I'm not in jail. This kind of mistake is apparently not all that unusual, and the SEC had a mechanism to handle it. In other words, we quickly settled the issue. Notice what a different path Martha Stewart went down! I don't know whether her mistake was as innocent as mine, but everything I've heard leads me to believe that she would have been much better off settling as soon as she realized there was a problem. As they say, "It's not the crime that gets you. It's the cover-up."

Here's the point from a corporate perspective. I believe that in most cases when something illegal has happened at a company, the situation is very much like my personal one. Somebody made a mistake - perhaps innocent, perhaps not - but either way, a law has been broken. The individual who made the mistake may end up fired or worse, but from the company's perspective, it's always going to be better in the long run to fess up and clean up.

In short, Michael argues that you are better off with a strong data retention policy even in situations where it does prove that you did something wrong. And of course, if the data proves that nothing illegal happened, then so much the better.

January 04, 2006

Redundant Array of Pyramid Hieroglyphics (RAPH)

How do you store data so that it can be accessed a long, long time in the future? Like hundreds or thousands of years in the future?

Right now I am in Egypt, studying how the ancient Egyptians accomplished this.

Some temples in ancient Egypt focused on the dead, but others focused on the living. Those temples were partly religious, but they also functioned as centers of learning and healing, a sort of combination church/university/hospital.

The temple of Kom Ombo focused on the living, and it used an interesting data protection technique. The temple came with an "Operator's Manual". When operating a temple, every day of the year requires a different procedure — different prayers, different offerings, different sacrifices to be performed by the priests. To ensure data protection and procedural compliance, the builders carved the operator's manual — a large table with 365 different procedures — into stone walls.

Then, as now, preventing identity theft required extra data protection for sensitive personal information. Pharaohs made colossal statues of themselves, but if it was a good statue, a later Pharaoh would recut the hieroglyph to replace the old name with his own. In response, Ramses II developed write-only hieroglyphs. He cut them inches deep into granite. Expensive, true, but thirty-five hundred years later, Ramses II is one of the best known Pharaohs.

For most of the past two thousand years, hieroglyphs have been unreadable. But then the Rosetta Stone was discovered, which had the same Egyptian text written in Greek letters as well as hieroglyphics. To make sure your data can be read in a thousand years, write it in multiple formats.

The data that I accessed at Kom Ombo — with the help of a tour guide — was perhaps twenty-three hundred years old, but the pyramid of the Pharaoh Teti at Saqqara contains hieroglyphic data over four thousand years old.

Teti protected his data under tons and tons of stone (hardened storage?), but he also used redundancy. In his burial chamber, the same message was repeated down long columns. Many copies of the columns were repeated across the wall.

What data would you protect so that it could be read four thousand years in the future?

The RAPH protected message in Teti's tomb was this. Over and over and over, it said "Teti".

Recent Posts



Subscribe to Dave's Blog

RSS 2.0
Atom
© NetApp, Inc.  |  "Safe Harbor" Statement