Next Previous Contents

2. Understanding File Handles

The file handle is a central part of all NFS and related operations. A file handle in an opaque string of bits which is used to uniquely identify a file or other filesystem object. In Version 2, the file handle in 256 bits long (32 bytes). In version 3, it is variable in length, up to 512 bits long.

The fact that the file handle is opaque means that the client should not attempt to understand anything about the file from inspecting the contents of the file handle. The only operations that the client should perform on the file handle are to copy it, and to compare it for equality with another file handle.

This leaves the server free to encode information about the file into the file handle in what-ever way it sees fit. As Unix allows files to be moved around the filesystem without changing their intrinsic identity, it is important that the NFS servers only encodes information about the file, and not about its location in the filesystem hierarchy, otherwise confusion can result.

The traditional contents of a file handle is:

The server is free to use this sort of information, or anything else that will serve the same purpose. The important thing for the server is that it must be able to generate a unique file handle for each file that preferably does not change across restarts, that it must be able to reliably find a file given the file handle, and that it must be able to determine if a given file handle is still valid (not old and not fake).

With this context, we can now look at the file handles used by knfsd in Linux.

From linux/include/linux/nfsd/nfsfh.h we find that the file handle is built from a structure containing:


struct nfs_fhbase {
        struct dentry * fb_dentry;      /* dentry cookie */
        __u32           fb_ino;         /* our inode number */
        __u32           fb_dirino;      /* dir inode number */
        __u32           fb_dev;         /* our device */
        __u32           fb_xdev;
        __u32           fb_xino;
        __u32           fb_generation;
};

and some NUL padding. (Note to code readers, each of these are actually referenced with a fh prefix rather than the fb shown here. See nfsfh.h for the reason. fb_dentry doesn't only have a different prefix, it is infact spelt fh_dcookie.)

The fields have the following usage:

fb_dentry

A previous version stored the address of a kernel structure related to the file in this entry. This is not stable over reboots and turns out not to be needed. The current version always sets this to the value 0xfeebbaca. This value is not checked, only set.

fb_ino

This field stores the inode number of the file - cast to a 32 bit number if needed. This is stored in native byte order (as are all values, as the client never looks at them).

fb_dirino

This field stores the inode of the directory that contains the file. Given that files can be in multiple directories and can move between directories, this is neither unique nor stable. It is used to help locate the file within the dcache as will be explained later.

fb_dev

This field contains the device that the file system was mounted from, which within Linux is a reasonably unique identifier of the filesystem. A number of file systems, such as procfs and smbfs do not have associated devices, so a unique anonymous device is allocated at mount time. For such file systems, this field is not guaranteed to be stable across restarts. However these file systems are not normally exported (and knfds actually refuses to provide access to some of them. Possibly it should reject all of them).

fb_xdev

This field is the device number of the directory that was exported.

fb_xino

This stores the inode number of the directory that was exported (which may not be the directory that was mounted). The field together with fb_xdev and the IP address of the client are used to determine whether the file handle is actually valid for that client.

fb_generation

The generation number is used to differentiate between an inode before and after it has been deleted and reused. The generation number changes in a non-predictable way whenever the inode is reused.

The most complicated part of dealing with a file handle, for Linux, is in finding the file given the file handle. This is inpart due to the presence of the dcache.

In order to read or write to a file within Linux, a struct file structure is needed. These structures do not refer to the inode directly, but do so through an entry in the dcache, a struct dentry. Also, directory operations like lookup and create require a dentry.

As the dcache always contains a prefix of the filesystem directory structure, finding a dentry requires making sure that the parent directory (or a parent directory, as some files have multiple parents) and all of its ancestors are also in the dcache. It is for this reason that the file handle contains the inode number of a containing directory.

2.1 Interpreting a file handle - OLD

The filehandle code, in linux/fs/nfsd/nfsfh.c goes though some hoops to get hold of an appropriate dentry as will now be described.

When a new handle arrives in a request a struct svc_fh is created and passed to fh_verify to check that it is valid, and to find the dentry. fh_verify calls find_fh_dentry which does the real work. This goes through several stages to try to find the dentry.

Firstly it looks in a cache of recent file handles using find_dentry_in_fcache. This is currently a no-op as it depended in the fb_dentry files of the file handle, which is now depreciated.

The next stage involves checking to see if the inode is currently in the inode cache, using iget_in_use. If it is, then a valid dentry should also be in the dcache, and can be found because each inode points to a list of dentries that refer to it. The code loops though all dentries until it find one with a parent that has an inode number that same as fb_dirino. Why it cares as long as it has a dentry, I'm not sure.

If no appropriate dentry is found, it proceeds to look in the rename cache. Every time a file is renamed into a different directory, a record is kept in a cache of the inode number the old and new directory inode numbers. Of course, this cache does not survive a restart. If the inode/dir is found in the rename cache, then the list of dentries is rechecked looking for the new directory inode number.

If it still hasn't found an appropriate dentry, it continues to stage 3, which tries to find the full path name of the file, and then create the dcache entry from that path name. The path name is found in much the same way that the unix pwd command works to find a path name.

  1. It fakes up a temporary dentry for the directory and reads through looking for an entry for the given file inode. At the same time it records the inode number for the parent of the directory from the ".." entry.
  2. Then it fakes up an dentry for the parent directory which it now has an inode number for and repeats the process.
One reason that these faked up temporary entries cannot be used more generally, is that the vfs level will not look up ".." through them properly as they don't have a valid d_parent entry. (They could be used for files though).

If this fails, then the file must have been renamed to a different directory a long time ago.

Stage 4 uses a path name cache to try to find the path name of the parent inode. I'm not really sure of the point of this, and wont go into the detail yet. The name cache is maintained by add_to_path_cache and get_path_entry.

Stage five is simply to fail.

2.2 Interpreting a file handle - NEW

Note: This section assumes my patches to 1.4.7

As knfsd looks up a file handle for every request, and will often recieve a number of requests on the same file handle (e.g. several read requests on the one file) it is well worth while doing some caching to improve lookup speed. Fortunately the dcache and icaches provide adequate caching, and knfsd does not need to to any of it's own.

REMOVE THIS? When a file handle arrives, it is passed to fh_verify to check that is is valid, and that the user has appropriate access. After some simple consistancy check (fb_xdev must equal fb_dev, and the fb_xdev,fb_xino pair must be exported to this host), find_fh_dentry is called to find an entry in the dcache for the inode referred to in the handle.

Providing that the underlying filesystem support the read_inode operation, the inode with the given inode number is found using iget_in_use. (If read_inode is not supported, then the filesystem simply cannot be NFS-exported).

If this inode was already in the icache, then it will have a pointer to a valid dcache entry, and the lookup is complete.

If this inode was not in the cache (either it has been flush to make room or the server has been restarted since the inode was last used), then a dcache entry needs to be created.

Here we need to worry about the NFSMNT_SUBTREECHECK export option. If this option is set, then we want to make sure that every file accessed is a descendant of the export point. When exporting whole filesystems, this checking is un-necessary and can be avoided by clearing this option.

If this option is not set, and inode being search for is not a directory, then the dcache entry that is returned does not need to be located in the dcache tree, it's parent and child pointers will never be check. So for non-directories with no SUBTREECHECK requirement, we simply create a dcache entry with d_alloc_root and return it.

For other objects we need to find a valid location in the dcache tree. This requires being able to find the (or a) parent for this object, and then a parent for that parent and so-on until an object in the tree is found. For directories we can always find a parent by reading the directory and looking for the '..' entry. For non-directories we need the fb_dirino entry in the file handle.

Given this parent inode number, find_fh_dentry walks up the directory tree, building a dcache path as it goes, until it finds an ancestor that is already in the dcache. It then splices the path into that ancestor, and returns the base of the path which is the dcache entry of the file that is wanted.

If SUBTREECHECKing is required, and a file is rename to a different directory, then accessing it with the old file handle will only work as long as the entry for the file is still in the dcache. Once it expires, access from that filehandle will no longer work. It would be possible to encourage entries for renamed files to stay in the dcache longer, but we would need some data on how long entries tend to stay anyway, and how much moved files are accessed by their old filehandle to see if there was any value in this.

2.3 Using the File Handle

knfsd stores active file handles in a struct svc_fh which looks like:


typedef struct svc_fh {
        struct knfs_fh          fh_handle;      /* FH data */
        struct dentry *         fh_dentry;      /* validated dentry */
        struct svc_export *     fh_export;      /* export pointer */
        size_t                  fh_pre_size;    /* size before operation */
        time_t                  fh_pre_mtime;   /* mtime before oper */
        time_t                  fh_pre_ctime;   /* ctime before oper */
        unsigned long           fh_post_version;/* inode version after oper */
        unsigned char           fh_locked;      /* inode locked by us */
        unsigned char           fh_dverified;   /* dentry has been checked */
} svc_fh;

The first three entries are (hopefully) fairly obvious. The fh_handle is the raw handle that came over the wire and fh_dentry and fh_export are the dcache entry and export entry that have been derived from it.

fh_pre_size, fh_pre_mtime and fh_pre_ctime are intended for encoding Weak Cache Consistency data for NFS version 3. The values do not seem to be set significantly at present. fh_post_version is used to determine if this wcc data needs to be returned or not.

fh_locked is set when the i_sem semaphore on the inode is taken down, to make sure that it gets put back up when the file handle is release. fh_dverified records that the fh_dentry is valid, to make sure that it is released when the filehandle is release.


Next Previous Contents