Operating system design and specification: Part 1.

[Note: This is the first in a series of “chapters”. I’ll be revising as I see errors and in response to comments. As usual, this material is copyright Victor Yodaiken and rights are given to make, but not sell, verbatim copies only (in addition to fair use rights) ]

Update 1: Evening June 6.

Update 2: Morning June 22 Link to chapter 2.

Even the most obvious and central properties of operating systems are poorly understood and are widely assumed to be out of the reach of the kinds of mathematical approaches used in other engineering disciplines. Design specifications for operating systems tend to feature API definitions, but only the vaguest gestures to describe scheduling properties, timing, i/o capacity, and what operations do (semantics). The first part of this note develops some “informal” (english) specifications of interesting properties and shows why these would help but don’t provide the right level of precision. The second part will introduce some mathematical tools for describing operating systems and their components in terms of state machines. The mathematics is not the usual “formal method” symbol-fest and prerequisites are just familiarity with recursive functions and state machines.

The necessity for some mathematical approach is illustrated by the vagueness of security specifications which often reduce to “more secure” or “very secure” . As a result, software security, in practice, is obtained by adding mechanisms and hoping. But this is the general condition. It would be remarkable if we had solid security specifications, given that supposedly simple properties, like the desired behavior of a scheduler are elusive. General purpose operating systems are often assumed to provide “live” scheduling – so that every process eventually makes progress. Here’s an attempt at sketching what this means:

Live1: There is some N and some E so that for every process P, if P is ready to run at time T, then by time T+N nanoseconds, P will have either executed E additional instructions or P will have executed more than 0 additional instructions and requested an I/O operation which caused it to suspend.

Live1 is a useful property: assuring that the background page cleaning process and the process that checks for new hardware will eventually run even if the main data base application is under heavy load. But many constraints on system design follow from this requirement. For example, we better never have a situation where both P and Q have been waiting to run for N – NanosecondsToCompute(E) nanoseconds unless the OS somehow knows that both will have time to execute enough instructions to request an I/O operation. As usual in computer science, people spend endless hours trying to optimize the performance of schedulers, and little time trying to understand exactly what it is they are supposed to do. The result is a mishmash of conflicting local optimizations.

As I’ve noted elsewhere, one common conflict is due to attempts to graft live schedulers to real-time schedulers that aim at minimizing the delay for the highest priority process.

Real1: There is some R so that the highest priority runnable process is never delayed more than R nanoseconds.

Real1” is much too vague, for example it allows the system to run the highest priority task for one instruction and then delay R nanoseconds repeatedly, but it still causes difficulties. Real1 has a hard time with Live1. What happens if low priority process P is assured run-time due to Live1 but high priority process Q is assured run-time by Real1 and there is only one processor? It would be nice if there were rules that could help deduce conflicts between Live1 and Real1 just as we use rules to show that there is no integer solution to (a > b AND b > a). But even at this level of specification we can seem that there may be a problem.

Consider something a little more complex: what is involved in process switching. Of course process switching is something we have been doing for 50 years, at least. At a high level, the state of the switched out process must be saved so that when or if it is eventually resumed it will start in exactly the same place – sort of. The “sort of” is there because the contents of process memory pages can change while the process is suspended and this is the common situation when the process suspends waiting for input. Even when the process is suspended waiting for output, the process I/O structures will change to mark the new write location pointer (seek pointer). Process state must be defined in such a way as to ignore the movement of pages to backing store from memory. And we need to account for shared data pages and files, asynchronous I/O, blocked signals that remain pending, and more.

ContextSave: If P is active at time T and the next instruction completes saving the state of P, then a resume of P at time T+K will restore the processor state so that the next instruction will appear to be the completion of the instruction at time T, and and changes to the P process state in the time between T and T+K should be legal suspend state transformations of P’s state.

The ContextSave specification is going to need a lot of fleshing out to mean anything solid. We have to defined “process state” is and what legal suspend state transformations are and might need some thought on that instruction execution as the boundary between saved and unsaved states.

In the next installment, I will introduce methods of constructing and composing the gigantic, partially determined, multi-level state machines we need to describe how operating systems work.