Directory hard links

Why aren’t they supported in Unix?

A week ago I posted on my experimental file system project. I mentioned that I’d implemented support for hard links to directories, but that, since they’re not supported by the rest of the system, there was no way to test it! I’m pretty sure it worked anyway — at least as much as anything else in the program. ;-) But I’ve had some musings on why that is.

Hard links are just normal directory entries to an object. We call them “hard links” when we want to emphasise that an object can have more than one link pointing to it (i.e. more than one directory entry) and to differentiate them from symlinks (soft links).

Hard links to ordinary files are allowed. Symlinks to ordinary files and to directories are allowed. Hard links to directories are not. So what are the salient differences between directories and files, or between hard links and symlinks?

The first is that hard links are implemented in the file system itself, so all traversal of them is at a very deep layer. In fact, it’s exactly the same as normal traversal of a directory entry to its object, because there’s no difference. A symlink, on the other hand, is really just a file containing a path, and with a special flag. Implementation of traversal can be done at a much higher level because it works the same for all file systems that support that flag. Support for them is largely implemented in the standard library rather than the operating system. And because symlinks are different from ordinary files and directories, there is an established interface and set of approaches to dealing with them in the application layer.

Part of what the standard library (or OS) does in its symlink support is count how many have been traversed in any individual file operation. There’s a limit to the number of symlinks indirections allowed. On my system there’s a constant in sys/param.h that defines this:

#define MAXSYMLINKS 20

When a file is opened (e.g. using fopen or opendir), libc will check whether the specified path is a symlink, and if so it will read the path from it and try to open that instead. It will recursively do this up to MAXSYMLINKS times until it finds an ordinary file to open. It’s possible to have a cycle of links. Without this limit, opening any one of them would result in iteration through them, in search of an ordinary file that isn’t there.

It’s possible to have a symlink pointing to its parent directory. What happens when a program like find recursively examines that directory? Well, by default, find will not follow symlinks at all. But even when the option to follow them is passed, it will record how many links it has traversed on any one branch and stop when a limit is reached. It can do this because it can distinguish between links and ordinary directories.

But what if a hard link was created to its parent directory? Each time find follows that link, it will treat it as just another subdirectory. It cannot tell that it points back to an ancestor. It could conceivably keep track of ancestors it has seen and avoid infinite recursion. It would need to use the inode number to identify directories, since there is no other unique characteristic. But this requires additional programming effort and runtime cost — all to deal with the rather obscure (and administratively dubious) situation of cyclic hard links.

Symlink
(distinguishable)
Hard link
(not distinguishable)
File
(not recursive)
Works trivially Works trivially
Directory
(recursive)
Works by counting traversed links Does not work as could result in infinite recursion!

To summarise, hard links to directories are not supported due to the conjunction of two properties:

  1. Hard links cannot be distinguished from other entries.
  2. Directories can contain other directories.

The absence of directory hard links means that directory traversal is guaranteed to terminate: either by the absence of links back to the parent; or, in the case where symlinks to point back to a parent, by placing a maximum on the number of links to be traversed.

And this makes programs like find (and numerous others that operate on directory trees) simpler and safer.

Aside. In my file system, directory links are in fact used internally. When a file or directory is moved, a hard link is created in the target directory, and the original link is removed. This is intended to be an atomic operation, so at no time can user programs arrive at the directory by more than one link.

This entry was posted in Programming and tagged , , . Bookmark the permalink.

Leave a comment