« February 2006 | Main | April 2006 »

March 2006

March 21, 2006

Expect Double Disk Failures With ATA Drives


After my last blog entry, comparing our describe-the-whole-company marketing message with EMC's, I feel like writing about something more technical. Why not double disk failures in RAID arrays?

Normal RAID arrays (RAID 4 or RAID 5) protect against any single disk failure. The natural question is: How likely is a double disk failure?

There are two answers. The first is that it's pretty unlikely to get two complete disk failures at the same time. On the other hand, it is frighteningly likely to have at least one read fail during the reconstruct. If this happens, you lose two blocks of data—the block for which the read failed, and also the corresponding block on the failed drive that you were hoping to reconstruct. If you are lucky, the two blocks could be unimportant data, maybe even unallocated data, but you can't be sure.

My math says that with a four disk RAID array, based on 400GB ATA drives, you will lose data in about 10% of RAID reconstructs. Yow! That should make it clear why NetApp has double parity RAID (RAID-DP) enabled by default on all of our systems. (Some people use the term RAID-6 for any double parity RAID.)

Let's walk through the math for this four disk example. Here are Seagate's technical specs for a 400GB SATA drive. They say that the bit error rate is "1 per 1014". If one drive fails, you have to read all the data on the other three drives to do the reconstruct. The expected failure rate is the total bits read divided by the single-bit failure rate:

400,000,000,000 * 3 * 8 / 10^14 = 9.6%


Build a 16 disk RAID group, and the expected failure rate goes to 48%. (Again, this is not total data loss, but loss of two blocks of data during the RAID reconstruct.)

I believe customers should be asking hard questions of any storage vendor selling ATA drives without some form of double-error protecting RAID. To be fair, some people claim that the bit error rate of ATA is down to 1015, in which case you'd expect to lose data in about 1% of RAID reconstructs with our hypothetical four-disk array. On the other hand, many people configure arrays with more than 4 drives per RAID group, which drives the failure rate back up. For a 16 disk array, the expected failure rate would be 4.8%. Even 1% seems bad to me.

Bottom line, using ATA drives without double protecting RAID is questionable. As drives grow, I suspect that it will become a requirement even for Fibre Channel drives.

March 16, 2006

Simplifying Data Management

A couple of blog entries back, I talked about how "EMC uses Information Lifecycle Management" as the top-level statement of everything that they do. Although their particular choice of phrases has caused quite a bit of confusion in the industry, I do believe that it's a good thing for companies to have a top-level statement that helps people understand what the company is about.

At NetApp, our big picture statement of what we do is "Simplifying Data Management".

A recent article about complexity in electronic gizmos caught my attention:
Half of all malfunctioning products returned to stores are in full working order, but customers can't figure out how to operate the devices, a scientist said on Monday.
The article is about consumer products, but I believe it also applies to computing equipment in the business world. The article goes on to say that the average consumer will "struggle for 20 minutes to get a device working before giving up." In the business world, I know that IT people often struggle for much longer than 20 minutes, but in the end sometimes they do give up. I think this is one reason for "shelfware"—software features that people purchase but never install.

Simplifying data management helps IT avoid this struggle. I know there isn't much in large corporate data centers that would qualify as truly simple. These are giant, complex environments. Complexity is especially serious in the storage world, because the amount of data being kept is growing so fast. The more we can "simplify" the better, even if we never actually get all the way to "simple".

One of the things I'm most proud of about NetApp is that customers often tell me how much easier it is to work with NetApp equipment than our competitors'—easier to install, easier to provision a new LUN, easier to replicate data. Even though we are the 4th largest vendor of Networked Storage (SAN, NAS and iSCSI), we are the 2nd largest vendor of replication software that protects data by making remote copies. Given our size in storage systems, we have much more than our fair share of replication. I believe this is because configuring replication for most storage systems is so complex that people use it less than they'd like, and we've done better.

Okay—I'd better stop now, before I turn this into an advertisement. My real point is that I like "Simplifying Data Management" as a quick summary of NetApp. It captures something that we care deeply about and focus much of our effort on.

March 08, 2006

Storage Virtualization - "The Great and Powerful Oz has Spoken"

Virtualization is right up there with ILM as a source of confusion in the storage industry. In my last entry, I tried to explain why ILM is confusing. Here I'll try the same for storage virtualization.

"Virtualization" is an old concept in computer science. You can have virtual memory, virtual LANs (VLAN), virtual private networks (VPN), virtual PCs (VMware and Xen), virtual tape libraries (VTL). When the first graphical user interfaces were invented, each window on the screen was called a virtual terminal.

These are all examples of virtualization, yet they are unrelated in terms of what problem they solve, who they are for, or how they work. And so it is with "storage virtualization." By itself, the term gives you no hint about what it is or what it does, except that it will apply this concept of virtualization—whatever that is—to storage.

So what exactly is virtualization? It's when you convert the physical reality that you are stuck with into some virtual reality that you wish you had. Consider VMware or Xen. The reality is that you only have one PC, but you wish that you had 10, and VMware makes it magically appear that you do. Consider a VPN. You wish that you had a private, secure wire from your PC at home to the network at work. The reality is that you just have the public internet, but VPN software uses encryption and tunneling to give you the illusion of a private, secure wire. Virtualization is always about creating an illusion, faking-you out, showing you something that isn't really there. Put bluntly, virtualization is about lying to the user.

My favorite metaphor for virtualization is the scene in The Wizard of Oz where the giant flaming face of the Wizard says, "Ignore the man behind the curtain! The Great and Powerful Oz has spoken." The virtual illusion is that Oz is a giant flaming face, but the physical reality—revealed by Toto when he pulled back the curtain—is that Oz is a frail old man. This metaphor shows that you must have a sufficiently solid "curtain of virtualization", or else physical reality may intrude awkwardly into the illusion you are trying to present.

In storage, RAID is a form of virtualization. It converts unreliable physical disks into virtual disks that never fail. When a disk fails, RAID creates the illusion that it is still there.

LUNs are a form of virtualization. They look exactly like a disk drive, and can be used by any application that talks to disk drives, but they are constructed by chopping real disks into pieces, and then gluing those pieces together. LUNs can be smaller than real disks, bigger than real disks, and—with appropriate striping—faster than real disks.

Thin provisioning is a form of virtualization. The user thinks he has a 100 gigabyte LUN, but the storage system only consumes 40 gigabytes to create that illusion.

These examples virtualize disk drives, but you can also virtualize at the layer above the storage system. Products that virtualize above the storage system include EMC Invista, NetApp V-Series, Hitachi TagmaStore, Rainfinity, Acopia and NeoPath. All are called "storage virtualization", but they operate differently and solve different problems. Some virtualize NAS, some virtualize SAN, some completely hide the storage system behind them, others add a layer of additional capability.

To me, categorizing products according to whether or not they use virtualization is about as useful as categorizing them according to what programming language they are written in. That may be an interesting detail to some technically minded customers, but for most customers the focus ought to be about what problem you are solving.

The great and powerful Oz has spoken.

Recent Posts



Subscribe to Dave's Blog

RSS 2.0
Atom
© NetApp, Inc.  |  "Safe Harbor" Statement