Oracle Optimizes Its Database for NFS
NFS has become critical to data center grid environments. As a result, Oracle has optimized its code specifically for NFS. Instead of relying on the operating system, Oracle’s Direct NFS Client generates NFS requests directly from the database.
Direct NFS was inspired by experience at Oracle’s Austin Data Center. Oracle uses NFS to run its applications on tens of thousands of Linux servers accessing many petabytes of NetApp storage. In 2005 they had 12,000 Linux servers and 3 petabytes of NetApp storage. Today’s numbers aren’t public, but they are much larger.
When an operating system capability becomes sufficiently
important, Oracle pulls it into the database. Memory management became
critical, so Oracle said, “Just give me the raw pages, and I’ll manage them
myself.” Disk caching became critical, and Oracle said, “Just give me the raw
disk blocks, and I’ll cache them myself.” Now NFS has become critical, so Oracle
says, “Just give me a raw TCP/IP socket, and I’ll generate NFS requests myself.”
Steve Kleiman has argued that as Oracle becomes more sophisticated, the operating system becomes little more than a device driver framework that gives the database raw access to the hardware. That sheds new light on Oracle’s Unbreakable Linux program.
What exactly does Oracle gain from Direct NFS? The primary
benefits are simplicity and performance.
It’s simpler because you don’t have to worry about how to configure NFS. What timeouts should you use? What caching options? It doesn’t matter. Oracle looks at how you have NFS configured to figure out where the data lives, but aside from that, your settings don’t matter. Oracle takes control.
It even works with Windows. Just mount the data that Oracle
needs using a CIFS share, and Oracle figures out the location of the data and
accesses it via NFS. (CIFS is great for home directory sharing, but it isn’t
designed for database workloads.)
Performance is better because Oracle bypasses the operating system and generates exactly the requests it needs. Data is cached just once, in user space, which saves memory – no second copy in kernel space. Oracle also improves performance by load balancing across multiple network interfaces, if they are available.
For more technical details on Direct NFS, check out this article
by Kevin Closson. He works for PolyServe,
which is a NetApp competitor, but technically speaking, he talks good sense. I
also recommend this article,
by NetApp’s John Elliott, comparing Oracle performance over Fibre Channel, NFS
and iSCSI.
NetApp has been closely involved in Direct NFS from the very beginning. Peter Schay came up with the idea while he worked for Oracle’s “Linux Program Office”. He wanted to simplify things for Oracle customers running on Linux, many of whom were hosted on Oracle’s On-Demand environment at the Austin Data Center. He worked closely with NetApp engineers to prototype and test the idea. The Oracle ST team used his functional specification to develop the production version of Direct NFS now shipping in 11g. (Today Peter works for NetApp.)
I love how NFS has evolved over the past couple of decades. Twenty years ago, it providing file sharing to small engineering workgroups; today it provides the data backbone for some of the world’s largest data centers. What it is about NFS that has allowed it to make this transition? What is it about NFS that Oracle would choose to build it directly into their database? That’s the topic for another post!





Dave,
You're a touch of class! I appreciate you jumping across "competitive boundries" like you did with your reference to one of my blog entries.
As a side note, I'm not PolyServe these days, since HP bought us. And, in fact, I'm not HP much longer either as I'll be taking a role in the Oracle Server Technologies Group after Labor Day weekend.
By the way, say "Hi" to Pete for me. I wondered where he landed...
Posted by: Kevin Closson | August 21, 2007 at 03:00 PM
You've been able to do direct IO on NFS for a while now (even 2.4 kernels), so with an ordinary NFS mount you could do IO directly to userspace and do all the caching there. So the caching thing is irrelevant (perhaps not so for Windows platforms...)
Posted by: Stewart Smith | August 21, 2007 at 06:28 PM
Direct NFS Client doesnt seem to bypass the OS completely, because it still needs to use the OS TCP stack. In any case, it looks like it does bypass the Linux NFS and RPC layers in the client(..underneath the VFS..) and thats a major gain. For example, see the NFS client perf figures at:
http://gelato.unsw.edu.au/IA64wiki/NFSPerformance
Posted by: Shehjar | August 23, 2007 at 07:34 PM
Shehir,
DNFS eliminates the overhead of entering the kernel with libC or libaio calls that must vector to RPC via the VFS layering. In short, DNFS is RPC. Oracle generates and tracks their own XIDs and just shoots RPC straight from the server. You can see the overhead reduction if you go to the URL Dave provided to my site and get the paper I wrote with Oracle on the matter.
Posted by: Kevin Closson | August 24, 2007 at 10:11 AM
After reading up on this on technet.oracle.com last month, I was surprised not to see something from NetApp (only an HP press release seemed to mention it) to this exciting development until now.
Oracle's benchmarks seemed a bit misguided (focusing far too heavily on interface load balancing instead of the real meat of the technology); how soon could we hope to see a NetApp TR to dive into this? In particular it'd be interesting to see how this skews the results from TR3496.
Though certainly not NetApp's problem, it was disappointing to see this not done under NFSv4. Particularly in a RAC environment one would think directly exposing the database to v4 delegations could be huge...
Posted by: Kevin | August 30, 2007 at 12:46 PM
Folks,
The last post by "Kevin" (8/30/07 12:46PM) wasn't me. I've gotten a barrage of email from folks asking me about NFSv4. To be perfectly honest, I'm not expecting much gain out of NFSv4 vis a vis Oracle throughput. Folks, Oracle is a seek, read/write workload. It all comes down to payload on the wire...pure grunt work. It is quite simple to get Oracle driving I/O at GbE wire capacity. NFSv4 can't make the wire fatter.
Posted by: Kevin Closson | August 31, 2007 at 10:52 AM
Posted by: Antonio Phelipe | October 15, 2007 at 06:54 AM
I'm using 2 NIC's, one contected to the ethernet for aplications requests, and other contected directly to the SAN switch just for mounting the database file systems.
The performance is great!!!
Posted by: M. Sousa | October 15, 2007 at 08:10 AM
Deals Unlimited is an online mobile comparison uk portal offering free Contract Mobile Phone deals of Nokia, Samsung, LG, Motorala, Sony ericsson with 3 Mobiles, Orange,T, O2, Vodafone and Virgin Mobile Phones. Buy cheap & latest mobile phones online.
Posted by: Michael williams | October 31, 2007 at 10:13 PM