Formal methods for doing what?

John Regehr’s question below gets to one of the basic problems I see in the field of “formal methods” - the general failure of researchers in the field to look at experimental data and the operation of actual systems.  The normal operation of science in, say physics or biology or even analysis of algorithms, involves interplay between the development of theory and practice. Theoretical developments that conflict with experimental data are considered to be problems for the theory and strenuous efforts are made to fix the theory. “Formal methodists” have not been, in general, willing to dig into the data enough to follow this approach. The question of “liveness” provides a good example.

The informal concepts of “liveness” and “safety” seem interesting for understanding programs. The usual definitions [ e.g. from Manna and Pnueli’s “Temporal Verification of Concurrent Programs”] are something like:

  • A liveness property says some assertion  about system state is eventually true – for example, a program terminates or a program advances.
  • A safety property says some assertion about system state is never true – for example, the system operating system deadlocks.

These are possibly useful concepts with some intuitive appeal and we should be able to represent these intuitive concepts mathematically. For me, the next step is to look in detail at the data and to try to formalize what engineers find pertinent about these properties in actual systems.  But 25 years of follow-on research (M&P’s paper is 1981) involves all sorts of wacky but clever and technically difficult attempts to find a mathematical interpretation in which “liveness” is a property of infinite state sequences. For example, there is an influential paper by Alpern & Schneider (the tech report is 1986) which uses Büchi automata and infinite state sequences to model liveness properties.  Motivation from empirical data is entirely absent from that work. I can’t imagine a “liveness” property of an operating system that has no bound. There are properties where we don’t really need to know the bound or where the bound depends on some partially known factors (e.g. how much memory will be available), but there is still a bound. When programmers say that every process makes progress, the natural liveness property of a time-shared system, they mean eventually to be within some time frame. Scheduling processes in the limit is not an interesting property of operating system. Perhaps there are infinite state properties of interest, but I have not seen them described in the literature.

If a mathematical method involves counter-intuitive concepts or strange infinities or recourse to non-euclidean geometries we don’t normally consider part of the operation of programs, but it helps solve a problem then it would be justifiable to use it.  Nobody runs  algorithms in the limit but analyzing them in the limit gives us a rough idea of performance and, in many cases, a good intuition about how real implementations may perform. Here’s Knuth

The O-notation is a big help in approximation work, since it describes briefly a concept that occurs often and it supresses detailed information that is usually irrelevant. Furthermore it can be manipulated algebraically in familiar ways … [ Volume 1, 3rd edition. Page 108]

We can look in vain for credible arguments of this sort in the formal methods literature. Instead all sorts of exotic concepts primarily from meta-mathematics are brought in without justification from the field of study – because formal methods became its own field of study only superficially related to actual computer engineering.

Real computer systems are  finite state. Either the system can loop without ever satisfying the assertion or it cannot and must enter a state where the assertion is true. (And calling a loop an “infinite path” only makes something simple into something complicated.) Boundedness is a property of the actual objects we are attempting to model. A reasonable argument for using Büchi automata or well-founded sets or lattice-theoretical methods would need to show how some properties of software systems that are important to engineers could be represented with suppression of “detailed information that is usually irrelevant” and advantages in algebraic manipulation. Instead, formal methods researchers generally justify these methods on the simplicity of the mathematics needed to formalize the concepts. The argument that some unbounded notion of eventuality can be captured by a certain simplified but complete logical construct that only needs 5 axioms and 3 deductive rules may be an interesting exercise, but it is a result about formal logic, not a result about computer science – unless that notion of eventuality can be shown to be important in actual systems design.

So when I looked at liveness properties of scheduling in the second chapter, the implicit bound is a one of the primary issues to analyze and it is interesting to examine what happens when our high level constraint that “all processes must eventually progress” comes in contact with resource constraints that cost microseconds. Liveness properties remain usefully distinct from safety properties even if we discard unbounded liveness. “Every process eventually makes progress” is a perfectly sensible abstraction of one property of a time-shared operating system and it is different from safety properties like “the system never deadlocks”.  We can use the bound to transform liveness into safety – “there is no reachable state where more than T nanoseconds have passed since a process has executed for any process P for some T that we do not specify now.” But I think this just reflects the way systems actually work and it is useful for designers to understand that there may be underlying real-time constraints even for time-shared processors.  In fact, I believe that many of the most difficult problems of practical system design (e.g. the “fork bomb” and “OOM” in UNIX systems) are harder to solve because our design methods “supress detailed information that is usually irrelevant” at the high level, but do not properly unsuppress this information when we need it.  [This last paragraph may be particularly obscure, but I will try to clarify it more later]