For those who are not familiar with NPIV (N_Port ID Virtualization), let me give you a short summary. Basically, NPIV is the ability for a physical N_Port to carry multiple Virtual Ports IDs using a point-to-point connection.
Something similar called Multi-ID also accomplishes the same thing using a Loop topology. In fact, we used this early on with the FAS270 systems for FC configurations.
NPIV is seen as a "simplification" step in server Virtualization deployments with some of the benefits touted as providing Higher Availability, Lower Total cost of ownership, QoS etc.
While NPIV has the potential to provide some benefits, such as the ability to replace HBAs seamlessly, at least in theory, since the Physical HBA WWNs should not be tied to Zoning and LUN masking (this is not true for ESX with this release), it also increases the headaches associated with zoning, lun masking and can create management nightmares.
First of all, NPIV with ESX server requires RDM. That means that for each LUN mapped to a VM, you will need to configure separate switch zones as you would typically do with physical servers and zone every physical path to the VM.
Because the physical HBA and the storage ports must be visible to the VMkernel, you will end up with at least 2 zones called "control" and "working" zone:
Zone 1 - "working zone" will include:
- The VM's VPORTs
- The storage Target port for the LUN accessed by the VM
Zone 2 - "control zone: will include:
- The physical HBA ports on the server
- The storage Target Ports
The above is a mess and in dynamic server virtualization environments where VMs are commissioned and de-commissioned regularly, is not going to fly. How would like to call your storage admin several times a day or a week to modify zones? What if it takes several days for a new zoning request to be satisfied given that in some environments changes can only occur at specific times during a week or month.
One of the benefits of NPIV is QoS since some vendors apply QoS policies at the port level to limit IOPs and Bandwidth. We've chosen a different path and apply QoS policies via FlexShare at the Volume level itself so to NetApp arrays it provides no benefit.
Another NPIV benefit mentioned was the ability to track storage traffic per VM. Well, if your tracking tools use the Fabric as the point of discovery then that'll help but given that we're talking about RDM here and the whole LUN is assigned to a specific VM, disk array vendors are able to track I/O, reads/writes, IOPs, latency, KB/s, MB/s etc, on a per LUN basis on the array side. That's representative of the VM's I/O performance.
Yet another NPIV benefit cited was the ability to track VM storage capacity utilization using the WWPNs. Again, if your tracking tools use the fabric to do this then this will help but you don't need NPIV to accomplish this. In fact this is already been done by SRM tools today without using NPIV.
Lastly I've also seen a statement that NPIV with VM zoning provides application isolation. Fooey...ESX server already guarantees VM isolation and thus application isolation.
All in all, NPIV using RDM seems to me of little value and it appears to complicate things much more than it simplifies them. In fact, I don't see simplification anywhere. My view is that it'll help more the SRM vendors that are dependent on the fabric for discovery and reporting rather that the Administrators.


One major benefit? Virtualized MSCS nodes on VI3 without additional dedicated HBAs.
Posted by: Jerome Crea | April 15, 2008 at 06:55 PM
Hi Jerome,
Thanks for the comment. You shouldn't need dedicated HBAs for MSCS and Windows 2003. In fact, just scanning thru VMware MSCS guide I didn;t see anything pointing to such a requirement although I may have skipped thru it. In any case here's why you shouldn't need dedicated HBAs for Win2k3...
For MSCS, the recommended driver is the STORport driver which uses a hierachical reset structure vs the SCSI port driver used in Windows 2000. The following comes from Microsoft's Whitepaper of comparing STORport vs SCSIport drivers:
"When SCSIport detects certain interconnect or device errors or conditions, it will respond by using a SCSI bus reset. On parallel SCSI, there is an actual reset line; however, on serial interconnects and RAID adapters, there is no bus reset, so it must be emulated in the best way possible. Whichever way the bus reset is done, the code path always disrupts I/O to all devices and LUNs connected to the adapter, even if the problem is related to only a single device. Such disruption requires reissuing in-progress commands for all LUNs.
In contrast, Storport has the ability to instruct the HBA to only reset the afflicted LUN; no other device on that bus is impacted. If the LUN reset does not accomplish the recovery action, Storport attempts to reset the target device; and, if that doesn’t work, it emulates a bus reset. (In practice, the bus reset should not be seen except when Storport is used with parallel devices). This advanced reset capacity enables configurations that were not possible (or were unreliable) in the past with SCSIport."
Microsoft for MSCS and Windows systems using the SCSI port driver that booted of the SAN has specific requirements to seperate the Boot LUN from the shared cluster resources because of the resets mentioned above.
Cheers
Posted by: Nick Triantos | April 15, 2008 at 09:35 PM