Newspapers in the UK hate August. Everyone goes on holiday, and there is so little real news that journalists are forced to write acres of copy, made up entirely of stories about how incredibly hot it is. Remember that famous Sun newspaper headline?
"PHEW, WOT A SCORCHER!"
A classic. But not true this year. It's depressingly wet all over the UK. (I admit it. We Brits are obsessed by the weather. It's the first -- and sometimes the only -- topic of conversation.)
So you can imagine my surprise when EMC's Scott Waterhouse at The Backup Blogger took me to task on the reliability of NetApp's VTL because I described RAID-5 as dangerous.
Wot, no weather where you are?
In Denial
Like talking about the weather, it's really not newsworthy stuff. There's no news here, and everyone knows it.
It's just that some sections of the industry appear to be in denial about RAID-5 as it's used in "modern" SAN and NAS systems. Val Bercovici covered it in depth, and backed it up with some good solid research, in a reply to Robin Harris after Robin issued an Open Letter to Seagate, Hitachi GST, EMC, HP, NetApp, IBM and Sun.
And Val's response was the only one from any vendor.
VTL RAID
Our VTL uses a form of RAID-4, with some aspects of RAID-1 (mirrored data), and the RAID size is 6+1; 6 data drives and one parity.
That's one significant difference. Plus, another major difference between NetApp VTLs as compared with SAN and NAS systems is how degraded mode (a drive failure in a group) is handled. A NetApp VTL isn't a SAN or NAS system, and it doesn't need to act like one.
1. VTL RAID Stops Writing to the Degraded Group
As soon as a NetApp VTL sees a disk failure, writes are redirected away from the affected RAID group to another group. In a SAN or NAS environment, the degraded RAID group has to continue serving data to users, both reads and writes.
These are IOPS and head movements that interfere with the RAID rebuild process, and increase its duration significantly. But as we have stopped all writes to the affected RAID group, it reduces the IO load and allows a much greater percentage of the IOPS – up to 100% of them if there’s no restore active on the group – to be delivered to the more important task of the rebuild.
A SAN or NAS system needs to continue to service read and write IO during a rebuild, increasing the time, and increasing the exposure.
2. VTL RAID Rebuild Proceeds Much Faster
NetApp VTL rebuild times are much faster; 4 to 24 times as fast, because the requirement to continue serving data is avoided. The data is laid out in stripes and only the occupied stripes are rebuilt, further reducing the time by decreasing unnecessary head movement and unneeded reads.. The window of time and exposure to a second disk failure is dramatically reduced.
A SAN or NAS system takes far longer to rebuild, increasing the time, and increasing the exposure.
This VeriTest report, from which the graph is taken, makes it clear how much smaller these NetApp VTL rebuild times are.
VTL Self Tuning and Striping
There's a degree confusion in the post about the differences between self-tuning and striping. Scott says;
“Self tuning (in short) ensures that data from a given virtual cartridge is likely to end up on a large number of RAID groups... This means that the data from a backup stream--the writes to a virtual cartridge--are "sprayed" across all available LUNs (RAID groups). This is simply another way of describing striping.”
Self tuning isn't striping; it's deciding which RAID group should get the next chunk of data. It is very unlikely that data from any given cartridge will end up on a large number of RAID groups; the VTL tracks the RAID groups that have already been used by a given cartridge and give those groups preference.
Some real world figures from current customer data. No names (obviously!) and it's a small but varied sample size; it's typical of how a NetApp VTL lays out data.
| Configuration | Maximum RAID Groups Used per Virtual Tape Volume | |||||
| Virtual Tape Volumes | Total RAID Groups | Virtual Tape Drives | 2 | 3 | 4 | >4 |
| 304 | 33 | 31 | 10 | 0 | 0 | 0 |
| 382 | 14 | 32 | 58 | 10 | 4 | 0 |
| 1900 | 64 | 27 | 39 | 0 | 0 | 0 |
The data is not "sprayed" across any and all RAID groups. None of them in this sample uses more than 4 RAID groups.
"... I would also note that the strategy [self tuning] is really only effective at addressing single stream performance."
Self tuning doesn't affect single stream performance at all. It's effective when there are multiple streams, and improves the total performance because it avoids having all of the active cartridges write to the same RAID group at the same time.
And Now, the Weather
In the best tradition of August reporting, back to the weather. I'm escaping the damp and cold and going on holiday. The weather we've had here in Scotland over the last month is so wet, I'm developing webbing between my fingers.
I'm off for a week to relax on the Spanish island of Mallorca in the little village of Deia. Yes, that's the very blue pool I'll be dipping my tootsies in while sipping on something cool and refreshing.
I'll get to see the sun! Lovely. Can't wait. And the meterologicos is scorchio.
.

Comments