Is NFS a Form of SAN? (NFS for Enterprise Apps)
I recently wrote
about how both Oracle and VMware support NFS for situations that would have traditionally
used block-based storage, like SAN or iSCSI.
Here is my favorite
reaction to Oracle's Direct NFS Client:
This is so wrong, but I guess they're so
far down the path it no longer matters.
-– Wes
Felter
I disagree that it’s “wrong”, but it’s important to ask why Oracle over NFS gives some people the heebie-jeebies. Perhaps people worry that the NFS protocol might be “chatty” – somehow less efficient at transporting data across the network. It’s true that NFS has lots of fancy requests that iSCSI and Fibre Channel SAN do not, to create directories, set permissions, move files, and so on, but none of that matters when all you do is read and write blocks of data.
At the protocol level, block-reads over NFS and block-reads over iSCSI are almost identical. The main difference is that NFS asks for a certain number of bytes, starting at a given byte offset, and iSCSI asks for a certain number of blocks, starting at a given block offset. With NFS you must divide by 512 to convert bytes to blocks. So what! The other difference is that iSCSI and FC-SAN use a LUN to identify the container holding the blocks, and NFS uses a file handle. I’m over-simplifying, but you get the point. At the protocol level, for block traffic, there is almost no difference between NFS and iSCSI or SAN.
A LUN and a file are both just containers that hold blocks of data, so get over it. It was this similarity that originally convinced me NetApp could “unify” SAN, NAS and iSCSI into a single storage appliance. (This is the key idea behind unified storage.)
There are significant differences in how the protocols connect to the host operating system. FC-SAN and iSCSI both plug in at the block layer, while NFS plugs in at the filesystem layer. Those have very different paths through the OS. Historically, bugs in the filesystem layer slowed down NFS. For instance, NFS in Solaris used to be single-threaded, so multiprocessor systems only got one CPU worth of NFS performance. Those bugs have been fixed, but in most operating systems, the block layer is still a slightly lighter-weight interface.
This is why it is so significant that both VMware and Oracle have now built NFS directly into their software as if it were a block interface. When Oracle generates NFS packets directly, it completely bypasses the OS filesystem layer, and it reduces the difference between NFS and iSCSI to the minimal protocol differences that I described above. VMware takes block level requests out of virtual machines and converts them into NFS requests, again blurring the distinction between NFS-as-NAS and NFS-as-blocks (or SAN).
Why bother converting blocks to NFS? After all, I just argued that NFS and iSCSI are almost identical for block traffic, so why not just use iSCSI if you want Ethernet, and FC-SAN if you prefer Fibre Channel?
Those extra NFS operations link to filesystem capabilities that are valuable for managing large numbers of containers. If you only had one LUN, or just a handful, it probably wouldn’t make any difference, but if you have hundreds or thousands of LUNs, it’s very convenient to give them meaningful names, group them into directories, and back them up with filesystem tools designed to handle thousands (or even millions) of separate objects.
NFS also lets you virtualize the data path. NFS provides an abstraction of the path from Application Server to Storage that customers really like and leverage. A whole industry of Fabric virtualization products is coming to market for Fibre Channel (think IBM SVC) to solve a problem already solved by NFS and TCP/IP. Path virtualization is why NFS does so well in grids. It’s the key to simple application mobility, and enables all kinds of resilience techniques.
To be clear, I’m not arguing that NFS is always the best solution for traditional block-based applications. We have many happy customers running Fibre Channel SAN and iSCSI, and we are investing aggressively in those protocols. My point is that many people simply reject NFS, mostly based on old data and misunderstandings. NFS definitely deserves consideration. Sometimes – not always – NFS is the best solution for mission critical enterprise applications.




Hello,
well another major difference is the existence of a free ISCSI Client (for SAN) for Windows. The NFS Client is going to be EOL´ ed any time soon, AFAIK. My question is: Are the any efforts to build a free NFS Client for Windows, that will support pNFS? Why doesn´ t NetApp do it and sell NAS boxes?
Thanks
Posted by: Dennis | October 05, 2007 at 06:43 AM
@Dave: so why not just use iSCSI if you want Ethernet, and FC-SAN if you prefer Fibre Channel?
FCoE? :)
This goes back to your prior post on the Direct NFS client -- lets please see an updated TR3495 and finally answer where the bottlenecks lie in the NFS layer.
As you laid it out, we're requesting a block from a filehandle or a LUN. Conceptually, this _should_ tilt the advantage to a NAS system since instead of dealing with an opaque stream of IO, the storage system can leverage the metadata (files, offsets within them, etc) for additional optimization.
Data ONTAP is cleanest in its NAS roots and FC can't come near NFS in terms of data integrity and reliability. Hopefully we can finally have the performance-minded _ask_ for NFS rather than being convinced that its an acceptable trade-off.
Posted by: Kevin Graham | October 08, 2007 at 01:23 PM
Dave,
I am a software developer doing some self-study about NAS. My comment might not be accurate.
According to some other articles I read, a usual server can easily drive an FC HBA to near its max wire-speed and the FC vendors are going to release 8 gbps products. When using FCP, the data are DMA-ed by the HBA and not much CPU got involved. With direct NFS, the pipe is usually a 1 gbps ethernet and the data have to go through the TCP/IP stack. I have difficulty to believe that the performance are almost identical.
I read John Elliott's article about comparison of performance. I found the FCP data was measured using a Qlogic HBA vs. NetApp storage system. Did he do some measurement using other vendor's storage system specifically designed for FC SAN?
Thanks,
Shibin
Posted by: Shibin Zhang | October 16, 2007 at 12:46 AM
A key advantage (perhaps THE big advantage) that NFS holds over iSCSI and FC-SAN for these uses is its standard, in-band means for provisioning storage: file creation, extension, truncation, and deletion. Provisioning and mapping LUNs is a heavier-weight, less standardized, out-of-band process. Oracle and VMware leverage NFS's in-band file provisioning to simplify and accelerate common tasks for DBAs and server administrators.
Posted by: Jeff Kimmel | October 16, 2007 at 08:34 AM
Jeff,
If I am a DBA, my priority list will be:
1. Data integrity (I have my job)
2. Performance (I do a good job)
2.1. I will prepare for the worst cases. I.e. if I want my system to be five-nine availability, I will prepare for once-in-one-year worst case. If I want six-nine, I will prepare for once-in-ten-year worst case. I won't use the common case to define the performance requirement.
2.2. The trend of server are 1) more and more CPUs, cores and hyper-threads in one server and 2) one server runs several instances of OS. When I was at school, my teacher said that the IO was always the performance bottle-neck. With the trend of the servers, the problem could be more serious. So, I will be conservative when choosing the IO pipe. I don't want this scenario to happen: when running 2 instances of OS there is no problem, but the performance is throttled by IO when running 3 instances of OS.
3. Easy to use (I have an easy life).
Again, I had never been an DBA. I just pretend that I am one. So, I could be wrong.
Thanks,
Shibin
Posted by: Shibin Zhang | October 17, 2007 at 12:25 AM
Shibin --
If your priorities are in that order, you want NFS. Take a NFS storage head and a FC storage head, take them down to the beach for some sun, and then put them back in the racks [not recommending this, but consider a SAN/network outage in place of a day trip]. You'll likely want to restart the database in either case, but I'll wager every time that only one of them will come back cleanly. The resiliency of NFS is an attribute that NetApp doesn't sell hard enough.
The Direct NFS client is promising for performance in the short term and really only needs to bridge the gap until NetApp is ready to start pushing NFSoRDMA and pNFS beyond niche markets. My only concern is that Oracle won't retire it when that time comes.
Posted by: Kevin Graham | October 17, 2007 at 06:36 AM
Kevin,
Most SAN disks are RAID-ed. SAN can also do the data protection tricks like journaling, snapshot, remote mirroring and so on in wire-speed (why am I saying so many good things about SAN?). I think Oracle itself has those tricks too.
I read a book called "Storage Security - Protecting SANs, NAS and DAS" by John Chirillo and Scott Blaul. In page 62, it says "General NAS weaknesses are
. File based versus block based, which may not be good for database
. Unless a separate IP network is created, NAS over IP can be nondeterministic
. Even if a database can leverage a file-sharing device, it may not be able to handle the nondeterministic nature of IP
. Typically not as robust in features and functionality as a SAN."
Shibin
Posted by: Shibin Zhang | October 18, 2007 at 12:38 AM
Hi Dave, I'm glad you liked my comment and took it in good spirit. Actually, I love NFS and at the office we don't touch FC or iSCSI.
What I disagree with is bypassing the operating system. The VFS abstraction is located in the OS for a good reason, and IMO speaking NFS directly from an application indicates that either the app or the OS is mis-designed. If an OS can't provide high-performance NFS, then the correct solution is to fix the OS, not bypass it. Of course, I realize that Oracle will probably never agree with me.
Posted by: Wes Felter | October 22, 2007 at 08:41 PM
In one of my comments, I mentioned “data integrity” and I like to give more explanation. According to dictionary, “integrity” has two meanings. One is “adherence to a code of especially moral”. For example, if someone always gives people a hand when he sees the people fallen down, he is integrity. If he steps on the people (unlikely in real life), he is not. For another example, if an engineer treats other people with engineerliness (always face-to-face, bow first before challenging another engineer and etc.), he is integrity. The opposite (face-to-back and etc.) is not. Most people are familiar with this meaning. The other meaning of “integrity” is “soundness, completeness”. This is what “data integrity” means.
I did not mean that data have personalities :)
Shibin
Posted by: Shibin Zhang | October 22, 2007 at 10:48 PM
the biggest of the big customers have tried NFS and found that a properly tuned FC array from OS to platters cannot be matched , period.
Netapps never sees these people.
Posted by: 945w | October 24, 2007 at 04:52 PM
Not sure why you'd say NetApp never sees "these" people. We are a Fortune 100 company and use NetApp for both SAN/NAS. Not exclusively - we are big and frequently purchase other companies - but our #1 storage provider is NetApp hands down.
I know this is true for several of our peer companies as well. It certainly may (I'd even go so far as to say it's likely) be faster in extreme circumstances to use FC instead of NFS when you throw the highest end hardware you can find at a problem...but for 99.9% of shops that's not required, and the benefits of having snapshot technology is a massive boon to the administrators.
We are currently in the process of moving our ESX environment from FC to NFS.
Posted by: JeremyinNC | October 25, 2007 at 11:19 AM
One of the reasons that MSFT is #1 is due to the fact that the CEO (Gates) was a geek at heart. Netapp is also on the rise for the same reason. This is one of the best technical descriptions of NFS vs FC/ISCSI and who’d think it would be from an Executive…
Posted by: Dan Pancamo | October 26, 2007 at 10:32 AM
> There are significant differences in how
> the protocols connect to the host
> operating system. FC-SAN and iSCSI both
> plug in at the block layer, while NFS
> plugs in at the filesystem layer. Those
> have very different paths through the OS.
It is fair to note that with things like HighRoad and parallel NFS, the difference in paths through the OS between NFS and FC/iSCSI gets blurry when using a blocks protocol as the data protocol.
Posted by: Mike Eisler | October 27, 2007 at 12:58 PM
> If an OS can't provide high-performance
> NFS, then the correct solution is to fix
> the OS, not bypass it.
In principle I agree. However, look at it from Oracle's perspective: if the OS vendor is ambivalent about the NFS client, this can make an NFS client bug an Oracle bug (and from the customer's perspective, it often becomes a filer bug too).
Posted by: Mike Eisler | October 27, 2007 at 01:09 PM
My shop has used FC for VMware since 2.0.0. With our recent decision to kick EMC to the curb and acquire a pair of 6070s for our storage core, we have been evaluating NFS for VMware.
After exhaustive testing the bottom line is this - we will never go back to fibre channel. Ever.
Flexclones, ASIS, and snapshots are simply transformational over NFS. Once you see it you will never plug in another piece of fibre for VMware again.
I'm more than willing to share my experience - email me if your interested @ rich.barlow (A.T)vacu.org
Posted by: Rich | November 01, 2007 at 06:36 PM
> If an OS can't provide high-performance
> NFS, then the correct solution is to fix
> the OS, not bypass it.
One of my favorate phrase is "Keep It Simple & Straightforward" (KISS), so I think direct NFS is a good idea.
Rich: Can I share your experience? I left storage industry for more than 1 year but still remember that EMC does support similar data protection tricks. Among those tricks, I think snapshot is most important to database. I like to use snapshot as an example to share some of my thoughts on how to judge whose tricks are better. I guess it takes 4 steps to take a snapshot: 1) pause database service, 2) flush database internal buffer, 3) execute snap command and 4) resume database service. The time interval between 1) and 4) is call database interruptive time. The shorter the interruptive time, the better the snapshot, especially if you want to run peak time backup. Another parameter is how many snaps you can take. The third one is the how the snaps are taken. I.e., I can create another temporary buffer within the same server. When the database flushes its internal buffer, I simply copy the content to my temporary buffer and then tell the database service that snapshot is done. On the background, I then slowly flush the temporary buffer to the disk. With my approach, you might worry about the “soundness”.
Thanks,
Shibin
Posted by: Shibin Zhang | November 03, 2007 at 06:35 PM