Part 1 of my investigation of LeftHand's claim to save money, given its less than stellar 35% usable from raw disk space, generated a number of interesting replies from HP. One is worthy of more analysis, but as it's a bit big for a comment, I've taken the liberty of extracting John Spier's reply to me. John was the former CTO of LeftHand Networks prior to its acquisition by HP.
There's a lack of information on how LeftHand does its stuff publicly available, so I've had to do a little"reading between the lines" and work from first principles. If I've got any of this wrong, please let me know, and I'll correct it. I've added a running commentary to John's comment; my apologies for breaking it up, but it's all here (in blue for clarity).
When using Network RAID 2 it protects you from multiple disk faults, complete array faults and site faults with auto failover and failback. NetApp can’t deliver this level of HA with auto failover and failback. Features like MetroCLuster give you data protection, but not HA, and at a lower capacity utilization than LeftHand. Can SnapMirror or MetroCluster automatically fail back, incrementally rebuild the primary site, while maintaining application state and data integrity – i.e. RPO=0 and 100% uptime? I didn’t think so.
Then you'd be wrong; that's exactly what MetroCluster is about. Except the auto failback; that's just adding a disaster on top of a disaster. But I digress, and that's the subject for another post.
It's worth pointing out before I analyse this claim that I originally thought that LeftHand had come up with a new paradigm with its network RAID; that it provided both data protection and high availability built on commodity tin. I was wrong; if it was that easy, we'd have done it. But NetApp's 15 years experience in doing this stuff has taught us otherwise.
First, I need to explain what a NetApp cluster is all about; then we can compare and contrast, and ask some questions.
NetApp Data Protection and HA
NetApp uses NVRAM (non volatile RAM) and transaction logging to capture writes to disk. All writes are acknowledged before they get to disk, but only after they've been logged. This means that when a single controller (non-HA) fails, the data we said to the server or client that we wrote, but stored in NVRAM, is replayed to the disks when the system comes back up. That way, we guarantee what we promised when we said we'd written the data. The data is both consistent and durable.
In a cluster or HA solution, we ensure cache coherency; the contents of the NVRAM on controller 1 are mirrored to controller 2. That way, when controller 1 goes down, controller 2 can replay controller 1's writes, and take over its workload, as the disks are addressable from both controllers. Again, the data is both consistent and durable; and we've made it HA, without downtime to the application. It carries on running.
If controller 2 now fails (or both fail together), we're still consistent and durable; see above for the single controller case.
Lastly, SATA drives in particular can suffer from "lost writes". Every drive has a cache where it stores data to be written. This is separate from any other protected cache, for instance NetApp's NVRAM.
As soon as an IO hits this buffer, the drive acknowledges the write. But blocks can subsequently be written in the wrong place, or not written at all, especially if there's a disk failure between acknowledgement and the physical write.
Because NetApp has the ability to control both the RAID and the file system, Data ONTAP 7G provides the unique ability to catch errors such as this and recover. Along with a block checksum, ONTAP also stores WAFL metadata (the inode # of a file containing the block) that provide the ability to verify the validity of a block being read. If the block being read does not match what WAFL expects, the data gets reconstructed ensuring that your data is both consistent and durable.
NetApp goes to extraordinary lengths to protect your data.
Here's the issue. I can't see this level of protection in a LeftHand SAN.
LeftHand Data Protection and HA
LeftHand systems are built from commodity servers and use a battery backed RAID controller to provide a log of writes to disk. This means that when a single node (non-HA) fails, the data in the cache is replayed to the disks when the system comes back up.
But what about lost writes? Do LeftHand SANs provided protection against failure to write data correctly from the disk's cache?
In a cluster, and using nRAID2, the IO is copied to a second node, the same scenario as in the single node case is played out. Effectively, cache coherency is provided by mirroring the data to a second (or third, or fourth) node across the network, which is slower and adds to latency.
But what about lost writes? Do LeftHand SANs provided protection against failure to write data correctly from the disk's cache?
With nRAID2, when a node fails, you want to guarantee your writes because the other node(s) are now holding a single copy of your data. That nRAID2 is now no more than RAID5 striped across one or more nodes, writing data through disk cache.
But what about lost writes? Do LeftHand SANs provided protection against failure to write data correctly from the disk's cache?
The choices to protect your data?
- Turn on write-through (turn off the disk cache and force IO straight to disk). On all disks on all nodes.
- Choose nRAID3 so you have a second mirror.
The first option causes huge IO performance problems; drives that are forced to write directly perform very badly indeed.
The second option of nRAID3 is the only alternative.
And every read is still fraught with danger. You may read a block from node A but get a completely different data from node B for the same block request -- because there's no guarantee of protection against lost writes on any of the nodes.
The LeftHand Triplication Calculator
Ok, let’s talk capacity. All NetApp’s customers know NetApp’s storage utilization is below 50% when using best practices.
NetApp's best practices are here. See page 20, section 7.4 Best Practice Configurations. This is the same old HP tap dancing.
But instead of re-hashing what everyone already knows, let’s do a simple calculation for a highly available multi-site SAN using MetroCluster (as a side note, you know Calvin is taking it easy on you with the MetroCluster pricing.) Let’s say a customer has 10TB of NetApp raw storage at the primary site and they replicate that 10TB to a remote site for HA and disaster protection. Storage utilization is now 50% (10TB/20TB.) Take your 63% at both sites, and we won’t bother to include things like the space taken up for NetApp’s root volume and replication log files. 63% of 10TB leaves you 6.3TB of usable capacity replicated. This means you can create 6.3TB of data out of 20TB raw. That’s 31.5%. With LeftHand’s Network RAID level 2 you can split your SAN across 2 sites for a better HA solution and the customer’s utilization, according to you, is better than NetApp’s – 35%.
Except it isn't anywhere near MetroCluster in terms of HA and data protection -- in fact it's nowhere near a single controller NetApp system in terms of data protection.
LeftHand is now down to 24% usable for an inferior solution.
24% usable
The Rest
Now it’s time for some education:
Network RAID is set at the volume level. Not all volumes require the advance data protection level of Network RAID level 2, therefore utilization is typically much better if used at a single site.
At which point, you now have a RAID5 solution with no lost write protection (or you turn on write-through on every disk and suffer huge IO penalties). Really, there's no point. Might as well buy a cheap Linux server and be done with it.
If you really want to get schooled let’s talk about dual-parity based network RAID and what that does for utilization. What we should really be talking about is cost and capacity utilization of NetApp GX vs. LeftHand, because that is the only product that comes close to LeftHand’s architecture. Does GX support block yet?
No, let's focus first on my claim that LHN means more space, more power, more cooling, more cost per usable TB for an inferior solution. Address that, and then you get bragging rights.
