There's been a bit of a discussion going on between John Spiers (HP LeftHand) and myself over a number of issues I raised about a LeftHand SAN's storage efficiency. The comments moved on to talk about HA (high availability) and data reliability, and John raised a number of questions that I thought deserved a longer answer.
John Martin of NetApp (who also blogs on NetApp's Storage Efficiency blog) has kindly provided me with more detail, and rather than post this as a comment, I thought it worthy (again) of a blog in its own right. Thanks to John for the responses.
John Spiers' points are in blue (and I think I've accurately captured them, but they've been lightly [edited] for context). They're also slightly out of order.
In summary, I think there's a need for greater clarity on LeftHand best practices. Everyone, including the user quoted below, appears to be operating in the dark. As I said in a previous post, we've had to do a little"reading between the lines" and work from first principles. If any of this is wrong, please let me know, and I'll correct it.
[Update 09July: Many LeftHand manuals, including best practices, appeared in late June/July, but weren't there when I first looked in late May/early June. Much reading to do!]
NetApp can't deliver this level of HA with auto failover and failback [compared to a LeftHand SAN]
It depends on what you mean by "auto failback". If you mean failback initiated without permission or authorization from a responsible human being, then you'd be correct. If you mean an automated process with minimal user intervention, then you're wrong. From a NetApp perspective, and that of most storage and systems professionals I've talked to when discussing failback requirements, automated and uncontrolled failback is usually judged to be a bad idea.
NetApp also have an excellent product in addition to MetroCluster, called MultiStore. This provides similar kinds of functionality (automated failover and failback) over standard IP connections.
Can SnapMirror or MetroCluster [...] incrementally rebuild the primary site, while maintaining application state and data integrity – i.e. RPO=0 and 100% uptime? I didn’t think so.
Yes it can. As soon as the administrator believes it to be safe, the "failback" process is initiated, and an automatic incremental rebuild and resync provides seamless failback.
[quoting from a NetApp document] "Mirrored active/active configurations do not provide the capability to fail over to the partner node if one node is completely lost. For example, if power is lost to one entire node, including its storage, you cannot fail over to the other node. For this capability, use a MetroCluster"
[quotes sections about manual cluster failover and prevention of “split brain” and JS then says] LeftHand has distributed quorum management that eliminates all possibilities of a “split brain”. This allows at least one site to operate and then automatically resync the other sites when they come back online.
This is an interesting quote, which when taken out of its correct context (a practice commonly called "quote mining") makes it sound much worse than when it's put in context.
- It should be noted that this is for a VERY unusual configuration, one that I’ve never seen go into production at any site. The comment is under the heading “Mirrored Active/Active Configurations”. With this, NetApp's Syncmirror is added to a standard active/active configuration to provide a second level of mirroring across the disks, but without a MetroCluster license for local (or stretch) functionality (suitable for distances of around 500m).
- Should a customer decides that his data required the extra protection of Syncmirror (i.e. RAID 6+1), then we would recommend the extra resilience provided for in this configuration by using MetroCluster.
The reason for this behavior is to avoid a “split brain” scenario. There is no way of automatically detecting the difference between a total failure of a system, or the the failure of all forms of communication between them. It's this that causes "split brain"; two or more running systems thinking the other has failed, when only the communication between them has failed.
This applies to every conceivable cluster configuration. That includes quorum disks, proxy nodes and other cluster node failure detection techniques; they are all effectively forms of communication between nodes.
Distributed quorum management requires at least three nodes. It is not possible to have a quorum based on two nodes.
I’m not letting you brush this one under the rug. MetroCluster and SyncMirror don’t provide the same level of availability that is inherent in LeftHand’s base SAN/iQ software offering, and LeftHand requires no additional equipment.
That's interesting, because here's a LeftHand user experience;
"So at work we decided to go with a Lefthand Implementation of iSCSI, I am rather unhappy to find out that with only 2 units you have to run a virtual machine to complete the Quorum for redundancy. I am not happy to find out that in reality you need 3 appliances to complete a Quorum for management and to ensure that you have redundancy and that everything is available."
You should have been clearer about comparing a three or more node site vs. a two node site. This HA "no additional equipment solution" is growing legs.
And while I’ll give kudos for an n-way implementation, unlike the NetApp documentation you quote, you don’t address how you handle what happens when there is a complete loss of communication only from the primary site to all other sites. None of the other nodes are able to detect whether the site is a smoldering pile of rubble, or if it’s just incommunicado, and if you make the assumption that lack of communication is a trigger for failover, then you have a recipe for split brain.
The manual “declared disaster” approach is the safest way of dealing with this. If a customer believes they have a foolproof way of detecting a true site failure, then its trivially easy to integrate this by automating MetroCluster failover (or failback for that matter)
Yes, [HP's LeftHand] Network RAID 2 can sustain any random 5 disk failure. In fact, a 4 array system configured in Network RAID level 2 can sustain up to 2 complete array failures, and up to 6 disk failures in each of the remaining 2 arrays, and all at the same time.
Hmmm... Picture courtesy of HP's LeftHand P4000 brochure 4AA2-5247ENW, April 2009.

Let's start with nodes. I have two copies of my data for nRAID2, and four arrays. If any two nodes fail, then I lose data. Which two arrays can I blow away in this four node system, and not lose data in my logical volume? With this simplistic 4 block LUN, 1 and 3 or 2 and 4. Unless you can arrange your node failures in advance, I'd suggest that is one node protection.
And the statement that I can survive any random 5 disk failure is equally implausible given the LeftHand diagram above. Two disk failures in node 1 in the same RAID5 group would kill node 1. A further two disk failures in a RAID5 group kill node 2, and we now have two node failures which can't be survived. The answer for nRAID2 is 3 random disk failures, not 5.
I'm not even going to try and work out what "up to 6 disk failures in the remaining 2 arrays" really means, although I'll bet the words "random" and "quorum" don't feature heavily.
Of course, this all assumes the HP LeftHand diagram above is correct, and not just a graphic designer's interpretation of the facts. The diagram below for the terrible usable capacity of a LeftHand system for 1 node failure protection (split brained or otherwise) is correct. I've had it double checked by HP.

LeftHand nRAID2 means 35% usable capacity
.