From me that is a rhetorical question.
But lately I've been fielding this question from folks far less immersed in NFS. So it is optimal for me and hopefully whoever is reading this to give the short answer and the long answer.
The short answer is no. In fact NFSv4 protocol is more robust than NFSv3. The long answer follows. This will be a three part discussion; part I will be in this blog posting, and other parts will compare NFsv4 with CIFS, and explore the implications for applications.
When NFS was invented 1984, it was termed a "stateless" protocol, because server and/or client restart did not require any recovery.
When byte-locking locking (to support the fcntl() API from UNIX) was added to NFS clients and servers, it came in the form of two "sideband" protocols: The Network Lock Manager (NLM) and Network Status Monitor (NSM) protocols.
NLM was the "steady state" protocol for granting and releasing locks. NSM was the recovery protocol for dealing with NLM client or server restart. The issue for client restart is that because all processes that had locks no longer exist, the locks are no longer needed by the client, and so the server needs to know that. The NSM protocol requires that clients that restart notify all NLM servers that the client does not need the locks it had, and so each NLM server releases that client's locks. The issue for server restart is that the because the NLM protocol did not mandate that lock state survive a server restart, the server can forget what locks each NLM client had. Instead the NLM protocol mandates that the server keep track of the list of NLM clients that had locks, and the NSM protocol is used to notify all such clients when the server restarts. The clients then had to reclaim all the locks they had before server restarted. The server holds off granting new, non-reclaim locks until the a grace period expires.
Can you spot the holes in these recovery protocols? The first problem is what if the client crashes but fails to restart? The client now has locks on the server that it will never release, thus requiring manual intervention on the server. The second problem is what if the NSM message from the server fails to reach the client and another client comes after the grace period expires to request the same lock the first client had? Now two clients think they have the same lock on the same file, and if locking is being used to synchronized modifications, data corruption can results.
The addition of locking, even as sideband protocols, made NFS as whole less robust, albeit only if processes on the client side acquired locks.
Enter NFSv4, which fixes these problems by dispensing with the NSM protocol, and making locks subject to expiration if the client fails to maintain contact with the server (in other words, locks are "leased" to clients).
The NFSv4 client is required to renew its leased in a fixed, known time before the term of the lease expires. If the client fails to renew in time the locks it has are subjection to revocation. So lets say the client crashes and never restarts. Then the locks it had will automatically expire, allowing other clients to acquire the locks.
When an NFSv4 server restarts, as with NLM, there is a grace period for reclaim. But there is no NSM-like notifications to the client nor are any needed. Because the client has to renew its lease, it will eventually find out about the server restart on a renew attempt. The grace period is at least as long as the period of the lease, allowing any client that had locks sufficient time to find out that it has lost its lease and its locks. Thus it is not possible for to clients to think they have the same lock on the same file.
So when taking to the consideration client that performs both file and lock access, NFSv4 is more robust.
A follow up blog posting will discuss robustness issues for applications.

Comments