The UNIX file system as a recursive function

Ignoring “write” operations, for the moment, a file system is just an implementation, a concrete manifestation, of a map from file names to file contents. A UNIX style file system complicates the story because file names are “paths” through a somewhat compromised tree structure. You could try to express the relationships between nodes in this tree and paths by brute force axioms. For example: if the path “p” can be resolved to a file, then every proper prefix of “p” resolves to a file that is a directory. But, at least to me, it’s more intuitively compelling to describe the map as a recursive function – capturing the structure algorithmically. In fact, the axiom approach gets pretty damn ugly when we add things like “mounted file systems”.

Begin with a recursive definition of a path: EMPTY is a path and if p is a path and n is a local file identifier then the sequence obtained by appending n to p (on the left) is a path. And there are no other paths than the ones that can be built from a finite number of applications of this rule. We can write “n/p” for the path obtained by appending n to p (appending on the left). Let’s require of course that the separator symbol “/” is not in any local file identifier. Other than that, there is no need to say anything at all about the set of file identifiers now – you can think of it as the set of strings that are “/” free if that helps, but they could be anything at all that is “/” free and can be connected into sequences. And “/” is just the name we use for the separator, it can be anything at all too.

UNIX style file systems are nearly always constructed on top of flat file systems. That is, we suppose we have a primitive set of “flat file names” of some sort and a map F that maps flat file names to the corresponding file contents. Flat file names can be the same as local file identifier or different – it doesn’t matter. Right now we don’t have to say much about these flat file names at all but we have some demands on file contents. In particular, files need to be able to “encode” lookup tables that relate local file identifiers to flat file names. Since we will require that a local file identifier appear at most once in such a table, we can think of the file contents as encoding a function- the “directory function”. If “i” is a flat file name and F(i) is defined, we need to be able to associate F(i) with a type: plain or directory (to start). Plain is just some data. If F(i) is defined and F(i) has type directory, there is some function d= D(F(i)) we can decode from F(i) so that for some finite set of local file identifiers, “d” maps those identifiers to flat file names.

The idea is that we can start with some flat file name “i” and a path, say, p=a/b/c and follow p from “i” as follows. First get i’=D(F(i))(a) if F(i) is a directory. Then get i”= D(F(i’))(b) and i”’= D(F(i”))(c) – assuming that the intermediate flat files are directories and they encode functions that map the flat file names “a”, “b” and “c”. There’s some complexity because we’re not assured that everything will work out – we may only be able to follow a path partway before coming to a dead end.

So define a function R to take a flat file function F, an initial flat file name “i” and a path p and produces either a terminal flat file name or FAIL if the path cannot be followed all the way. First we do the easy case R(F,i,EMPTY)= i. The recursive case is

R(F,i,n/p)=

R(F,D(F(i))(n),p) if F(i) is a directory and D(F(i))(n) is definedFAIL otherwise.

Associate F with a special flat file “r” – the root file. Then we can define U(p) = F(R(F,r,p)) if R(F,r,p) is not FAIL and let U(p)= FAIL otherwise.

And that’s it – except that now the flexible nature of the file system becomes clear. Let’s add a new file type “indirect” so that if F(i) is indirect, then we can extract a flat file map and a root flat file name from the contents. That is, if F(i) is an indirect file we can produce (F’,r’) = X(F(i)) where F’ is a flat file map and F'(r’) is a directory under F’. Say “i” is a properly defined indirect file under F if and only if F(i) is indirect and for (F’,r)=X(F(i)) we have F'(r) as a directory. We can now redefine R as follows to switch flat file systems on an indirect file.

R(F,i,n/p)=

R(F,D(F(i))(n),p) if F(i) is a directory and D(F(i))(n) is definedR(F’,r,p) if F(i) is an indirect file and X(F(i))= (F’,r) and F'(r) is a directory

FAIL otherwise.

Note that the first definition of R is obviously primitive recursive – because we lose one element of the path sequence on each recursive step – but it’s not obvious that the version of R with indirect calls must terminate. The final condition on the indirect case assures us that at least every second step must remove one element from the path.

The UNIX file system as a recursive function

Tagged on: specification