December 22, 2008

SuperComputing 2008 Slides from Multivendor presentation posted

NetApp, CITI (University of Michigan), EMC. IBM, LSI, Panasas, StorSpeed, and Sun co-hosted a BOF at SC08 last month.

Our presentation has been posted at http://pnfs.com/docs/sc08_pnfs_bof_slides.pdf .

December 19, 2008

It's Official: NFSv4.1 Approved for Proposed Standard

IESG just sent the announcement. IESG also approved three of the companion documents (the XDR description of NFSv4.1, the blocks-based pNFS layout, and the objects-based pNFS layout).


 

From: The IESG <iesg-secretary@ietf.org>
To: IETF-Announce <ietf-announce@ietf.org>
Message-Id: <20081219154356.CF25D28C101@core3.amsl.com>
Date: Fri, 19 Dec 2008 07:43:56 -0800 (PST)
Cc: nfsv4 chair <nfsv4-chairs@tools.ietf.org>,
     Internet Architecture Board <iab@iab.org>,
     nfsv4 mailing list <nfsv4@ietf.org>,
     RFC Editor <rfc-editor@rfc-editor.org>
Subject: [nfsv4] Protocol Action: 'NFS Version 4 Minor Version 1'
     to Proposed Standard
The IESG has approved the following document:

- 'NFS Version 4 Minor Version 1 '
   <draft-ietf-nfsv4-minorversion1-29.txt> as a Proposed Standard

This document is the product of the Network File System Version 4
Working Group.

The IESG contact persons are Lars Eggert and Magnus Westerlund.

A URL of this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-minorversion1-29.txt

Technical Summary

This Internet-Draft describes NFS version 4 minor version
one, including features retained from the base protocol and
protocol extensions made subsequently. Major extensions
introduced in NFS version 4 minor version one include:
Sessions, Directory Delegations, and parallel NFS (pNFS).

Working Group Summary

This document is the result of long construction, review, and
prototyping. While not all features of NFSv4.1 have been
prototyped or implemented the mainline features have received
reasonable prototyping.

Document Quality

The NFSv4.1 specification was subjected to a series of formal
reviews or walk-throughs that resulted in close review and
resultant issues and resolutions. As a result, the NFSv4.1
documents are complete and of reasonably high quality.
Note to RFC Editor

Personnel

Brian Pawslowski (beepy@netapp.com) is the document shepherd.
Lars Eggert (lars.eggert@nokia.com) reviewed the document for the
IESG.

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4

IESG's Review of NFSv4.1 is Done

The IESG (Internet Engineering Steering Group) has completed its review the NFSv4.1 specification. The Co-Chair of the NFSv4 Working Group, Spencer Shepler, just sent out this announcement to the Working Group:

The IETF announcement of their approval is pending and then
they will move on to the RFC editor queue for final publication.

We are DONE!

Regarding the last sentence, in my, very recent experience, the RFC Editor can make lots of editorial changes even for a small document. Given this is a 600+ page document, my work is not quite done, but nonetheless this is a great milestone.

Watch this space for the official IETF announcement.

November 14, 2008

Two Conferences Covering pNFS Next Week

Super Computing 2008 (SC08) in Austin and IETF in Minneapolis are next week. At  SC08, several storage vendors supporting pNFS will host a Birds of a Feather (BOF)meeting. The BOF is scheduled for November 19, 2008, 5:30-7:00 pm in Ballroom F of the Austin Convention Center. In addition NetApp will have a booth at SC08.

Unfortunately, IETF and other obligations prevent me from attending SC08. Joshua Konkle is scheduled to represent NetApp at the pNFS BOF.

At IETF I am scheduled to present an update on Federated FS (standing in for James Lentini), and my proposals for de-dupe awareness and pNFS metadata striping.

Parallel File Systems and HPC in the News

I read an article on Search Storage entitled Parallel file systems become requirement for HPC environments.

I agree with the author, Deni Connor, that parallel file systems are critical to HPC. It is hard to find a storage vendor selling into the HPC market without a parallel file system story. pNFS will be the mechanism to allow customers to normalize access to parallel file systems, so that they are not forced to deploy clustered file system in every node in their network, whether they wanted that node to serve storage or not.pNFS will give customers the flexibility to choose among parallel file systems, and the entire whole product around them (including Data Protection).

There are a couple points I want to clarify.

First, the article notes:

But the NFS protocol has high overhead, which limits its use with I/O-intensive applications.

In 2006, in a Technical Report written by John Elliot, NetApp disclosed raw I/O performance numbers comparing NFS with iSCSI. Figure 3 of the report is reproduced below.

 

image

While the numbers would no doubt be faster today for all storage access protocols, the point is that 10% (as compared to software iSCSI) is not high overhead.

Second, the article states:

pNFS, which is set to be approved by the IETF, draws on Panasas' DirectFlow parallel storage protocol.

The pNFS specification, which is part of the NFSv4.1 specification, was developed by employees (especially those with experience in parallel and clustered file systems) of several companies within IETF's NFSv4 Working Group. (Indeed, these companies will be hosting a BOF at SC08 next week.)

October 28, 2008

Proposals for De-Duplication and Metadata Striping posted to IETF

I've written a couple of Internet-Drafts (documents that are not RFCs, but eventually might end up as RFCs) that propose some ideas for de-duplication and metadata striping.

What does NFS have to do with de-duplication? After, it isn't as if a client is affected if server's storage array finds that 90% or so of its storage is redundant and acts accordingly. The answer is that if an NFS client is caching files, knowing that two files being cached have a block of data in common accomplishes two things:

  • Only one block has to be cached
  • Only one NFS READ request has to be sent

Thus just as de-duplication provides greater efficiencies resulting in lower requirements on physical space and energy, allowing NFS clients to be aware of de-duplication provides greater efficiencies in terms of better utilization of memory and network links (potentially reducing capital costs). The classic use case for de-duplication awareness is a hypervisor. A hypervisor that is switching among 100s of guest operating systems, each cloned from the same template operating system install image obviously has a very significant de-duplication factor (the percentage of data that is common among al the guests). I am aware that at least one hypervisor de-duplicates its cache by scanning the cache for duplicate blocks, and de-allocating redundant data. This does provide better cache utilization but at the cost of sending redundant READ operations.

The logic for metadata striping follows from he logic for data striping as introduced by pNFS. The pNFS protocol that is pending at IETF today only specifies data striping. Metadata striping provides two benefits:

  • Greater efficiencies due to spreading metadata like directories across several storage nodes.
  • Reduced latency by telling NFS clients where to send metadata operations like LOOKUP, OPEN, CREATE, and READDIR, versus having a node in a cluster forward the operation to another node. As with pNFS for data striping, moving the switch to client provides the most benefit.

With both of these proposals, the part I am pleased about is that thus far I see no reason to revise the NFSv4 protocol to support either metadata striping or de-duplication awareness. Instead both proposals use the pNFS framework. The pNFS protocol has the concept of a "layout" as its foundation. The three types of layouts the NFSv4 working group is standardizing are striping patterns for storage clusters accessed via NFSv4.1, SCSI (iSCSI and FC), and OSD. The NFsv4.1 protocol allows additional layout types to be specified. The metadata and de-duplication proposals are expressed as new layout types.

October 23, 2008

Summary of the SNW pNFS Talk Posted

Joshua Konkle has posted a summary of the talk on pNFS that he gave at SNW earlier this month.

October 22, 2008

Tech OnTap Article on pNFS

Joshua Konkle and I co-authored an article on pNFS for NetApp's Tech OnTap journal. It is now posted.

October 04, 2008

One Point on pNFS and Latency from the NFSv4.1 SNIA Developers Presentation

I mentioned over a week ago that Spencer Shepler and I presented NFSv4.1 at the SNIA Developers Conference.

I want to discuss one of the slides I presented which points out a subtle value that pNFS brings to storage.

The pNFS protocol is know for its capability to accelerate I/O operations per section. What it might not be know for is that it reduces I/O latency caused by storage clusters.

There are several storage cluster products out there, including Data ONTAP GX. Many of these have the capability to stripe data across multiple network storage controllers in the cluster, much in the same way RAID0 stripes data across multiple storage devices connected to a single storage controller. There are arguably just two ends of spectrum of storage clusters that stripe data: The share nothing architecture which Data ONTAP GX is near to, and the cache coherent architecture.

before_pnfs

 

The above picture is illustrating an NFS client (not pNFS) that NFS mounts one node of the cluster. Assuming the data of a striped file or file system is evenly distributed across the cluster, this means that in both architectures, there is going to be significant traffic among the storage nodes.

In the share nothing architecture 2/3s of the traffic sent to a three node cluster will be going over the cluster interconnect.

In a cache coherent architecture, the proportion of traffic that goes over the cluster interconnect depends on the hit rate of the caches, the size of caches, and for writes, whether the caches are write through or write back. Regardless, caches have to be primed to fill reads, and even if the caches are write back, eventually they fill up, and data has to be written out to other nodes. There are also issues the cache coherency protocol itself. For example say one node is caching data for satisfying read requests, and the origin node receives a write request. The write request is delayed until the node caching data acknowledges the notification to invalidate its cache.

It would be better if the NFS client just sent its data to the right node. Enter pNFS.

 

after_pnfs

 

 

The above picture shows that the pNFS client can direct its I/O to the optimal node in the storage cluster. No intra-cluster I/O. No additional latency.

October 01, 2008

I Left Out One Detail from the September 2008 pNFS Bake-A-Thon Report

I mentioned there were six pNFS server implementations tested last month in Austin.

What I did not mention is that NetApp tested its pNFS server for Data ONTAP!

I held that bit of information back because we were waiting to unveil it in an internal NetApp gathering of our field folks, including a live demonstration which went extremely well this morning.

Congratulations to NetApp's Pranoop Erasani, who is leading our Data ONTAP pNFS server project and the rest of the Data ONTAP NFS team.

Now the hard work of producing a finished product starts.

© NetApp, Inc.  |  "Safe Harbor" Statement