Main | December 2007 »

November 2007

November 23, 2007

A look at the ESX I/O stack

Given the proliferation of ESX server, I think it's worth looking into the ESX I/O stack, how I/Os get queued and processed until they reach the HBA driver queue. Then, we'll look at example of excessive I/O queuing on a LUN and the effect of increasing the LUN queue depth.

  1. When an application residing inside a Guest OS issues an I/O, that I/O gets queued to the Guest OS's Virtual Adapter driver. 
  2. The Virtual Adapter driver then passes the I/O to the LSILogic/BusLogic emulatorESX_IO_Path
  3. The LSI/Buslogic emulator queues I/O to the VMkernel's Virtual SCSI layer.  Depending on the configuration, I/Os passes directly to the SCSI layer or it  passes thru the VMFS filesystem before it gets to the SCSI layer
  4. Regardless of the path followed in #3, ultimately all I/Os will end up at the SCSI layer.
  5. I/Os are then sent to the Host Bus Adapter driver queue. From then on, I/Os hit the Disk array's write cache and finally the back-end disk.

So looking at the above journey, we can quickly observe that:

  1. Performance can be affected at various queuing stages. For example, performance can be affected by queuing at the VM level. It can affected by queuing at the VMkernel level and can also be affected at the HBA driver level.
  2. While a VMFS Datastore can have many queues because many VMs can use the same VMFS, a LUN only has as many queues as the per LUN queue depth setting.

So lets see how we can identify potential queuing issues using esxtop.

esxtop1

Figure 1.

  Press "d" for Disk Statistics:

disk stats

Figure 2.

Look at the "LOAD" entry above. This entry specifies the ratio of VMkernel active commands, and VMkernel queued commands in relation to the Adapter Queue depth. In the above example almost everything appeas normal except the QUED column. Too many queued commands in the VMkernel. So lets dig a little deeper and take a look at a specific VM and the LUN it resides in because something doesn't look right.

We'll run esxtop and select the GID for the VM Win2k3_B

esxtop2

Figure 3.

Press "e" to expand and enter the GID.

vmstats

Figure 4.

Press "d" to display disk statistics and expand on "e,a,t, l" in this order

stats

Figure 5.

From the above it looks like the "LOAD" has hit the roof and the %USD is excessive. %USD signifies the % of the LUN queue depth used by Active commands. LOAD, in this case represents the ratio of Active and Queued commands in the VMkernel in relation to the LUN queue depth. The current value in the above example is 1.97. This value should less than 1. Also keep an eye on READS/s and QUED values. Almost always, a non-zero QUED value indicates a storage bottleneck.

So it looks like the LUN queue depth needs to be increased from the default 32 outstanding I/Os per LUN to 64. After rebooting the host I can verify the new Queue depth by looking at the output of "cat /proc/scsi/qla2300/0". For a single ported HBA the port is 0. For a dual ported HBA, look at 0 and 1.

queue_depth

Figure 6.

After verifying the new LUN queue depth I'm now ready to re-run my previous test.

lunstats2

Figure 7.

Using esxtop and following the same steps as in Figures 4 & 5, I now see that the LOAD has decreased by 50% and is now below 1 as compared to the results in Figure 5. You will also notice, that the READS/s have increased substantially as well by, roughly 37%.

Additionally, by increasing the LUN queue depth, we were able to:

  • Increase the number of Active commands in the VMkernel for the LUN (63 from 32)
  • Decrease the number of Queued VMkernel commands (0 from 31) for the LUN. In fact, a non-zero value almost always contributes to excessive latency and poor performance. The idea is to be able to process as many commands as possible, without having them in the queue waiting to be serviced.

November 16, 2007

Deja Vu...all over again

Oracle jumped on bandwagon this week with Oracle VM (Xen based), SUN just announced xVM and $2bln investment, surprising its own users and partners who thought the money could have spent to enhance solaris zones rather than another hypervisor. Then we've got VMware who leads this race by a couple of miles right now, Xen, Virtual Iron (Xen Based), Linux KVM, Microsoft's Virtual server and its newly announced Hyper-V for Windows 2008 (aka Viridian).

Eight offerings, I'm sure I missed some, and without counting  IBM's POWER hypevisor and VIOS for their P series.

With the proliferation of all these hypevisors, and no compatibility/interoperability across these how is IT suppose to manage them?

This reminds me of the late 1990s early 2000s, Fibre Channel switch wars. We had Brocade, McDATA(Brocade), Gadzoox (RIP), Vixel (Emulex), Ancor (Qlogic), Inrange (CNT/McDATA).

It wasn't until customers started screaming about interoperability and the FC-SW2 standard passed that these vendors were able to inter-operate.

A similar situation will eventually develop here. Customers will realize how disjointed this space is becoming and they will come out swinging, but that won't happen until server virtualization deployments become mainstream. So the next 2 years promise to be rather interesting with folks starting to call for a standards based hypervisor. In fact, some of these calls have already started by a few industry observers but until the customer steps in and threatens to close his/her wallet, no progress will be made. Sad but true. 

November 09, 2007

Impact of Server Virtualization on Storage

"Customers are realizing the ability to virtualize on a single machine is "just a tiny piece of what virtualization can do for you," Greene said. To achieve real reductions in capex and op-ex, companies need virtualization infrastructure: all the management bells and whistles controlling a data center full of servers, she said."

The above comment was made by Diane Green in an article posted at the TheStreet.com a few days ago. Diane's right on the money.

A lot of folks don't realize that while server virtualization technologies deliver tangible results in increasing physical server utilization rates, at the same time they significantly impact storage utilization rates.

Lets think about what's happening when we virtualize servers. At a minimum, we move 15-20GB of redundant OS data into shared storage. If you multiply this number times the consolidation factor, it gets expensive real fast in terms of the capacity required to host these duplicate images. That coupled with the fact that administrators tend to ask for greater up-front allocation of storage than they would typically need, creates some serious low storage capacity utilization issues. And if you think virtualizing 500 servers creates storage capacity utilization issues, how about virtualizing 5000 desktops?

This is like stealing from Peter to pay Paul, as at least part of the savings realized from the physical server consolidation will now be spent on storage.

So how do you address these issues? In order to do so, you would need to consider storage virtualization as part of your design and deployment and take advantage of data deduplication, thin provisioning, and space efficient cloning technologies.

In fact, these technologies lower the cost threshold for enterprises to start fully utilizing them as there's no tradeoff between the cost versus the benefits they provide.

How about performance?

Server virtualization is notorious for creating high levels of small block random I/O. This high level of randomness can translate to array cache misses which means that performance to and from disk becomes important. That also means that arrays that stripe across a large number of spindles are better equipped to handle this potential issue than arrays that have continued to rely on the creation and usage of small raid groups comprised of 5-8 drives.

How about Data protection?

What's the cost of a dual disk failure in a physical server tied to one application versus the cost of the same failure in a virtual infrastructure tied to multiple apps? How many apps do you have to recover now? So the need for RAID-6 solutions is even more pronounced. And not just RAID 6 but high-performing RAID-6.  

So to Diane's point, failure to consider storage virtualization as part of your server virtualization strategy will only shift costs from the server to the storage.

© NetApp, Inc.  |  "Safe Harbor" Statement