The file handle is a central part of all NFS and related operations. A file handle in an opaque string of bits which is used to uniquely identify a file or other filesystem object. In Version 2, the file handle in 256 bits long (32 bytes). In version 3, it is variable in length, up to 512 bits long.
The fact that the file handle is opaque means that the client should not attempt to understand anything about the file from inspecting the contents of the file handle. The only operations that the client should perform on the file handle are to copy it, and to compare it for equality with another file handle.
This leaves the server free to encode information about the file into the file handle in what-ever way it sees fit. As Unix allows files to be moved around the filesystem without changing their intrinsic identity, it is important that the NFS servers only encodes information about the file, and not about its location in the filesystem hierarchy, otherwise confusion can result.
The traditional contents of a file handle is:
The server is free to use this sort of information, or anything else that will serve the same purpose. The important thing for the server is that it must be able to generate a unique file handle for each file that preferably does not change across restarts, that it must be able to reliably find a file given the file handle, and that it must be able to determine if a given file handle is still valid (not old and not fake).
With this context, we can now look at the file handles used by
knfsd in Linux.
From linux/include/linux/nfsd/nfsfh.h we find that the file
handle is built from a structure containing:
struct nfs_fhbase {
struct dentry * fb_dentry; /* dentry cookie */
__u32 fb_ino; /* our inode number */
__u32 fb_dirino; /* dir inode number */
__u32 fb_dev; /* our device */
__u32 fb_xdev;
__u32 fb_xino;
__u32 fb_generation;
};
and some NUL padding.
(Note to code readers, each of these are actually referenced with a
fh prefix rather than the fb shown here. See nfsfh.h
for the reason. fb_dentry doesn't only have a different prefix, it is
infact spelt fh_dcookie.)
The fields have the following usage:
A previous version stored the address of a kernel structure related to
the file in this entry. This is not stable over reboots and turns out
not to be needed. The current version always sets this to the value
0xfeebbaca. This value is not checked, only set.
This field stores the inode number of the file - cast to a 32 bit number if needed. This is stored in native byte order (as are all values, as the client never looks at them).
This field stores the inode of the directory that contains the file.
Given that files can be in multiple directories and can move between
directories, this is neither unique nor stable. It is used to help
locate the file within the dcache as will be explained later.
This field contains the device that the file system was mounted from,
which within Linux is a reasonably unique identifier of the
filesystem. A number of file systems, such as procfs and
smbfs do not have associated devices, so a unique anonymous
device is allocated at mount time. For such file systems, this field
is not guaranteed to be stable across restarts. However these file
systems are not normally exported (and knfds actually refuses to
provide access to some of them. Possibly it should reject all of
them).
This field is the device number of the directory that was exported.
This stores the inode number of the directory that was exported (which
may not be the directory that was mounted). The field together with
fb_xdev and the IP address of the client are used to determine
whether the file handle is actually valid for that client.
The generation number is used to differentiate between an inode before and after it has been deleted and reused. The generation number changes in a non-predictable way whenever the inode is reused.
The most complicated part of dealing with a file handle, for Linux, is in finding the file given the file handle. This is inpart due to the presence of the dcache.
In order to read or write to a file within Linux, a struct file
structure is needed. These structures do not refer to the inode
directly, but do so through an entry in the dcache, a struct
dentry. Also, directory operations like lookup and create require a
dentry.
As the dcache always contains a prefix of the filesystem directory structure, finding a dentry requires making sure that the parent directory (or a parent directory, as some files have multiple parents) and all of its ancestors are also in the dcache. It is for this reason that the file handle contains the inode number of a containing directory.
The filehandle code, in linux/fs/nfsd/nfsfh.c goes though
some hoops to get hold of an appropriate dentry as will now be
described.
When a new handle arrives in a request a struct svc_fh is
created and passed to fh_verify to check that it is valid, and to
find the dentry. fh_verify calls find_fh_dentry which does
the real work. This goes through several stages to try to find the
dentry.
Firstly it looks in a cache of recent file handles using
find_dentry_in_fcache. This is currently a no-op as it depended
in the fb_dentry files of the file handle, which is now
depreciated.
The next stage involves checking to see if the inode is currently in
the inode cache, using iget_in_use. If it is, then a valid
dentry should also be in the dcache, and can be found because each
inode points to a list of dentries that refer to it. The code loops
though all dentries until it find one with a parent that has an inode
number that same as fb_dirino. Why it cares as long as it has a
dentry, I'm not sure.
If no appropriate dentry is found, it proceeds to look in the rename cache. Every time a file is renamed into a different directory, a record is kept in a cache of the inode number the old and new directory inode numbers. Of course, this cache does not survive a restart. If the inode/dir is found in the rename cache, then the list of dentries is rechecked looking for the new directory inode number.
If it still hasn't found an appropriate dentry, it continues to stage
3, which tries to find the full path name of the file, and then create
the dcache entry from that path name.
The path name is found in much the same way that the unix pwd
command works to find a path name.
.." entry..." through
them properly as they don't have a valid d_parent entry.
(They could be used for files though).
If this fails, then the file must have been renamed to a different directory a long time ago.
Stage 4 uses a path name cache to try to find the path name of the
parent inode. I'm not really sure of the point of this, and wont go
into the detail yet. The name cache is maintained by
add_to_path_cache and get_path_entry.
Stage five is simply to fail.
Note: This section assumes my patches to 1.4.7
As knfsd looks up a file handle for every request, and will often recieve a number of requests on the same file handle (e.g. several read requests on the one file) it is well worth while doing some caching to improve lookup speed. Fortunately the dcache and icaches provide adequate caching, and knfsd does not need to to any of it's own.
REMOVE THIS?
When a file handle arrives, it is passed to fh_verify to check
that is is valid, and that the user has appropriate access. After
some simple consistancy check (fb_xdev must equal fb_dev,
and the fb_xdev,fb_xino pair must be exported to this host),
find_fh_dentry is called to find an entry in the dcache for the
inode referred to in the handle.
Providing that the underlying filesystem support the read_inode
operation, the inode with the given inode number is found using
iget_in_use. (If read_inode is not supported, then the
filesystem simply cannot be NFS-exported).
If this inode was already in the icache, then it will have a pointer to a valid dcache entry, and the lookup is complete.
If this inode was not in the cache (either it has been flush to make room or the server has been restarted since the inode was last used), then a dcache entry needs to be created.
Here we need to worry about the NFSMNT_SUBTREECHECK export
option.
If this option is set, then we want to make sure that every file
accessed is a descendant of the export point. When exporting whole
filesystems, this checking is un-necessary and can be avoided by
clearing this option.
If this option is not set, and inode being search for is not a
directory, then the dcache entry that is returned does not need to be
located in the dcache tree, it's parent and child pointers will never
be check. So for non-directories with no SUBTREECHECK
requirement, we simply create a dcache entry with d_alloc_root
and return it.
For other objects we need to find a valid location in the dcache
tree. This requires being able to find the (or a) parent for this
object, and then a parent for that parent and so-on until an object in
the tree is found. For directories we can always find a parent by
reading the directory and looking for the '..' entry. For
non-directories we need the fb_dirino entry in the file handle.
Given this parent inode number, find_fh_dentry walks up the
directory tree, building a dcache path as it goes, until it finds an
ancestor that is already in the dcache. It then splices the path into
that ancestor, and returns the base of the path which is the dcache
entry of the file that is wanted.
If SUBTREECHECKing is required, and a file is rename to a
different directory, then accessing it with the old file handle will
only work as long as the entry for the file is still in the dcache.
Once it expires, access from that filehandle will no longer work. It
would be possible to encourage entries for renamed files to stay in
the dcache longer, but we would need some data on how long entries
tend to stay anyway, and how much moved files are accessed by their
old filehandle to see if there was any value in this.
knfsd stores active file handles in a struct svc_fh which looks
like:
typedef struct svc_fh {
struct knfs_fh fh_handle; /* FH data */
struct dentry * fh_dentry; /* validated dentry */
struct svc_export * fh_export; /* export pointer */
size_t fh_pre_size; /* size before operation */
time_t fh_pre_mtime; /* mtime before oper */
time_t fh_pre_ctime; /* ctime before oper */
unsigned long fh_post_version;/* inode version after oper */
unsigned char fh_locked; /* inode locked by us */
unsigned char fh_dverified; /* dentry has been checked */
} svc_fh;
The first three entries are (hopefully) fairly obvious. The
fh_handle is the raw handle that came over the wire and
fh_dentry and fh_export are the dcache entry and export
entry that have been derived from it.
fh_pre_size, fh_pre_mtime and fh_pre_ctime are intended
for encoding Weak Cache Consistency data for NFS
version 3. The values do not seem to be set significantly at present.
fh_post_version is used to determine if this wcc data
needs to be returned or not.
fh_locked is set when the i_sem semaphore on the inode is
taken down, to make sure that it gets put back up when the
file handle is release. fh_dverified records that the
fh_dentry is valid, to make sure that it is released when the
filehandle is release.