« Deja Vu...all over again | Main | ESX Host Utilities »

November 23, 2007

A look at the ESX I/O stack

Given the proliferation of ESX server, I think it's worth looking into the ESX I/O stack, how I/Os get queued and processed until they reach the HBA driver queue. Then, we'll look at example of excessive I/O queuing on a LUN and the effect of increasing the LUN queue depth.

  1. When an application residing inside a Guest OS issues an I/O, that I/O gets queued to the Guest OS's Virtual Adapter driver. 
  2. The Virtual Adapter driver then passes the I/O to the LSILogic/BusLogic emulatorESX_IO_Path
  3. The LSI/Buslogic emulator queues I/O to the VMkernel's Virtual SCSI layer.  Depending on the configuration, I/Os passes directly to the SCSI layer or it  passes thru the VMFS filesystem before it gets to the SCSI layer
  4. Regardless of the path followed in #3, ultimately all I/Os will end up at the SCSI layer.
  5. I/Os are then sent to the Host Bus Adapter driver queue. From then on, I/Os hit the Disk array's write cache and finally the back-end disk.

So looking at the above journey, we can quickly observe that:

  1. Performance can be affected at various queuing stages. For example, performance can be affected by queuing at the VM level. It can affected by queuing at the VMkernel level and can also be affected at the HBA driver level.
  2. While a VMFS Datastore can have many queues because many VMs can use the same VMFS, a LUN only has as many queues as the per LUN queue depth setting.

So lets see how we can identify potential queuing issues using esxtop.

esxtop1

Figure 1.

  Press "d" for Disk Statistics:

disk stats

Figure 2.

Look at the "LOAD" entry above. This entry specifies the ratio of VMkernel active commands, and VMkernel queued commands in relation to the Adapter Queue depth. In the above example almost everything appeas normal except the QUED column. Too many queued commands in the VMkernel. So lets dig a little deeper and take a look at a specific VM and the LUN it resides in because something doesn't look right.

We'll run esxtop and select the GID for the VM Win2k3_B

esxtop2

Figure 3.

Press "e" to expand and enter the GID.

vmstats

Figure 4.

Press "d" to display disk statistics and expand on "e,a,t, l" in this order

stats

Figure 5.

From the above it looks like the "LOAD" has hit the roof and the %USD is excessive. %USD signifies the % of the LUN queue depth used by Active commands. LOAD, in this case represents the ratio of Active and Queued commands in the VMkernel in relation to the LUN queue depth. The current value in the above example is 1.97. This value should less than 1. Also keep an eye on READS/s and QUED values. Almost always, a non-zero QUED value indicates a storage bottleneck.

So it looks like the LUN queue depth needs to be increased from the default 32 outstanding I/Os per LUN to 64. After rebooting the host I can verify the new Queue depth by looking at the output of "cat /proc/scsi/qla2300/0". For a single ported HBA the port is 0. For a dual ported HBA, look at 0 and 1.

queue_depth

Figure 6.

After verifying the new LUN queue depth I'm now ready to re-run my previous test.

lunstats2

Figure 7.

Using esxtop and following the same steps as in Figures 4 & 5, I now see that the LOAD has decreased by 50% and is now below 1 as compared to the results in Figure 5. You will also notice, that the READS/s have increased substantially as well by, roughly 37%.

Additionally, by increasing the LUN queue depth, we were able to:

  • Increase the number of Active commands in the VMkernel for the LUN (63 from 32)
  • Decrease the number of Queued VMkernel commands (0 from 31) for the LUN. In fact, a non-zero value almost always contributes to excessive latency and poor performance. The idea is to be able to process as many commands as possible, without having them in the queue waiting to be serviced.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341ca27e53ef00e54fa00c1a8834

Listed below are links to weblogs that reference A look at the ESX I/O stack:

Comments

Thanks for this article. I do have a question about the performance of the storage system if the LUN queue depth was changed on 10 ESX host. Would performance be negatively affected on the storage system?

Sorry for the delayed response Don, but I've succumb to the flu the last 4 days.

Anyway, to address you question, any properly layed out array will not have any issues whatsoever.

What about queuing when using NFS, does it take out the VMFS/RAW down and go from VMkernel to NFS driver and out?

You're correct Steven. As you look at the drawing, imagine NFS as another "green box" next to VMFS/RAW.

This was very helpful -- thanks much for putting it together.

However, as I'm actually migrating from iSCSI to NFS for VMware ESX usage, if you happened to have time/desire, similar breakdown of the storage stack for NFS shares would be very helpful (i.e. storing VM's on a NetApp via NFS instead of iSCSI to avoid file locking/easier maintenance/etc.).

Thanks.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

© NetApp, Inc.  |  "Safe Harbor" Statement