process algebras

Failure to communicate

milenr 001

All is made clear in the brand new (Update again May 7 2010) Reducing Process Algebra paper.

From the point of view of someone who wants to understand programs and digital circuits at the RTL level, the “process algebra” line of research typified by Milner’s CCS [MILNER] begins with three rather peculiar assumptions.

(1) Programs are non-deterministic  or at least should be thought of that way,

(2) Processes communicating by synchronous message exchange are good “primitive” building blocks.

(3) Classical automata theory is too limited to express the semantics of programs and can be usefully replaced with a string algebra.

Some process algebra research appears to be concerned with generic “systems”, not mere programs and digital devices but I’m going to restrict myself to the smaller domain. These assumptions may  be reasonable  for business processes or astronomy, but for computer engineering, they appear to me to be unnecessary complications that don’t have any countervailing gains. Let’s begin with non-determinism.

Nondeterminism

Here we are not talking about “non-deterministic algorithms” which are mathematical functions that “guess” at a solution. A program implementing such an algorithm executes deterministically. For example the program “x= Get_Random(); SearchFrom(x);” may get “x” as an input or may compute it as a pseudo-random variable, but starting from the same state, receiving the same input will do exactly the same computation.  What the “process algebra” researchers mean by “non-deterministic” is that the next state of a system is not a function (or even a probabilistic function) of the current state and the current input, but may depend on something else.

Milner asks us to suppose we have two devices D1 and D2, so that  when powered up D1 has a possible “a” action followed by a choice of a “b” or “c” action. The D2 device starts the same, but after the “a” action the state with “b” and “c” actions enabled can also non-deterministically jump to a state where only “c” is possible.

We do not know what ‘determines’ the outcome for [D2], perhaps it is the weather”[MILNER]

One could, however,  leave the initial conditions of the device unspecified to get the same result without resorting to non-determinism.  Suppose we have, in addition to the inputs {a,b,c} a set U of unknown contents that contains environmental information. Define a class of  state machines that in the initial state input something from U, then an “a” and then either “b OR c” or just “b”.  Machines in class D1 cannot reach the “just b” input state, machines in class D2 can. Elements of U may be random numbers or can encode the date and a weather prediction algorithm or whatever.  This relocates the indeterminism to the environment and, to me, seems to make the problem statement more faithful to how digital devices work. The same procedure can be used for the “delays”  that Hoare argues indicate non-determinism.

The only solution is to allow the existence and duration of the delay to influence the course of the computation. For example the programmer can use the [ combinator of CSP to allow the earliest possible selection of the first possible event that can occur. Since delays are essentially unpredictable, we are almost forced to accept non-determinism as an inherent property of programs and algorithms, and even of computer hardware.[HOARE]

Those delays may be unpredictable, but how does that make the programs “non-deterministic”? A program does a “load x” operation and depending on caches, memory management, disk operation, bus width, contention, error correcting codes and so on, many things can happen and they may take some time. But given the same response by the OS and hardware, the program will always do the same thing.  Obviously, we have a choice between saying P is deterministic with input that e.g. can vary by system timing or saying P contains fundamental non-determinism.  And this is true as well for  multi-threaded programs. In one sense, there’s no argument: if we run a multithreaded program twice we may get different results. That non-determinism results from the OS selection of which process to run, the contents of cache, the level of network congestion and so on.  But is the program non-deterministic or does its environment provide it with inputs that can vary in time and order? If we have a program P made up of multiple threads and contents of all registers and memory is given by S, then the event of the operating system choosing to run the next instruction in some thread T causes a completely determined transition to the next state that only depends on S. In fact, the kinds of things we want to prove about multi-threaded programs are often assertions about what must or must not happen, given various possibilities about which externally generated events will happen next.

Coming back to Milner’s D1 and D2, a program is not going to make a non-deterministic choice about whether to enable the “c” action, the program will read a seresistornsor or examine a memory location or in some other way get an input that will then lead to a completely deterministic choice.  When we use Kirchoff’s current laws to analyze a circuit containing switches, we do not need to incorporate the “non-deterministic” operation of the switches in our circuit model. Similarly, when we evaluate the behavior of a multi-threaded web server, we do not need to incorporate the “non-deterministic” operation of networks in our web server model.

The remaining argument for including non-determinism in the mathematics we use to model programs is purely pragmatic and is based on supposed mathematical advantages and increased abstraction.  And this brings us to the question of automata.

Automata

I’m not going to  argue that just because something can be expressed in terms of classical automata, it should be expressed in that way,  but the process algebra literature is replete with weak arguments about what cannot be expressed in terms of classical automata. For the D1/D2 example, Milner writes, that while both machines accept the same regular language  “a(b+c)“, they act differently in a way that can be observed by an experimenter. Milner writes: “in standard automata theory, an automaton is interpreted as a language”, and so he argues that  the distinction he wants to draw between D1 and D2 devices cannot be expressed. But  “standard automata” theory in no way requires that machines that accept the same language be treated as identical – an automaton is  not “interpreted as a language” even though some automata may be used to recognize languages. Similarly , two resistors in series can be understood as a single resistance on the line, but they are not “interpreted” as a single resistor in a way that makes it impossible to distinguish two resistors from one. The original use of state machines was to model the behavior of switches in the telephone network and non-combinatorial logic elements in circuits. There is an entire field devoted to minimizing state machines where it is important to distinguish between the original and the reduced machines  even though they are intended to have identical behavior.  So, if we want to, we are certainly entitled to distinguish between the two state machines.  It is true that if we convert a non-deterministic D2 machine to a deterministic one, we  produce a machine that recognizes the same language but behaves differently: the D2 machines can enable two events and then change state to disable one of them. The theorem [RABINSCOTT] just says the converted deterministic machine and the original non-deterministic machine recognize the same language. That is, L(M1) = L(M2) does not mean it is impossible to distinguish M1 and M2 some other way.  For example, it’s easy enough to define Reject(M) for a state machine as the language of strings that can be rejected by M. If M is deterministic then L(M) = A* – Reject(M). But if M is non-deterministic, there may be strings in both sets.  Automata theory allows us to preserve this difference should we find it significant.

Another common argument is that automata cannot express interaction. Here’s a relatively complete argument.

The simplest model of behaviour is to see behaviour as an input/output function. A value or input is given at the beginning of the process, and at some moment there is a(nother) value as outcome or output. This model was used to advantage as the simplest model of the behaviour of a computer program in computer science, from the start of the subject in the middle of the twentieth century. It was instrumental in the development of (finite state) automata theory. In automata theory, a process is modeled as an automaton. An automaton has a number of states and a number of transitions, going from one state to a(nother) state. A transition denotes the execution of an (elementary) action, the basic unit of behaviour. Besides, there is an initial state (sometimes, more than one) and a number of final states.A behaviour is a run, i.e. a path from initial state to final state. Important from the beginning is when to consider two automata equal, expressed by a notion of equivalence.

On automata, the basic notion of equivalence is language equivalence: a behaviour is characterized by the set of executions from the initial state to a final state. An algebra that allows equational reasoning about automata is the algebra of regular expressions (see e.g. [58]). Later on, this model was found to be lacking in several situations. Basically, what is missing is the notion of interaction: during the execution from initial state to final state, a system may interact with another system. This is needed in order to describe parallel or distributed systems, or so-called reactive systems. When dealing with interacting systems, we say we are doing concurrency theory, so concurrency theory is the theory of interacting, parallel and/or distributed systems.[BAETTEN]

We see the same argument about language equivalence discussed above and a second argument about interaction that is actually more difficult to understand. Here is a product of automata in which each automaton can communicate with (interact with) each other automaton from a paper published in 1962.

Definition 1. Let {M1, M2.. ., Mn} be a set of Moore type machines in which the outputs of any machine Mi i = 1, 2,…, n, may be used as inputs to other machines. We shall say that this set of interconnected machines is concurrently operating if the next state (state at time t + 1) of each machine Mi depends on the present state of Mi, the present outputs (which depend only on the present states) of the machines to which it is connected, and the present external input. The ordered n-tuple (or “configuration”) of the present states of the machines M1 M2 . . ., Mn will be referred to as the state of the interconnected machine.[LOOP]

This product is certainly flexible enough to describe any type of communication. In fact, it can easily accommodate the type of synchronous message passing considered fundamental in the process algebras. Suppose we have Moore type state machines that output requests to send and receive messages on channels. Put n of them together in a product of the type Hartmanis defines and add another state machine to act as a scheduler. The input to the scheduler is the set of requests of the other machines. The output of the scheduler is a prioritization of the other factor machines. The highest priority machine then can act by either using the external input or the output of a communicating factor machine – which will also make a transition. I’ll return to this below, but first let’s look at the communicating process idea a little more.

Message passing processes

The  assumption  that “processes communicating via synchronous message transfer” are good building blocks for programs seems false.  Generations of programmers have repeatedly rediscovered that threads using only synchronous message passing are excellent building blocks for  creating deadlock.  Suppose process P does

loop: Get x from channel 1; Send y to channel 2; send  to channel 3

and process Q does

loop: Get x from channel 2; Send y to channel 1; send z to channel 4

In isolation the two processes are deadlock free. But when you connect them they lock up immediately. When processes are non-trivial, the failures are just more elusive. In order to preserve deadlock freedom under composition for such components you need a rule that components can be ordered in such a way that each component only “sends” to components of higher order or to the external network. Otherwise cycles of waiting processes can appear. Beginning an investigation of  composite communicating systems using components where

[NOT A RULE]      (DeadlockFree(A) AND DeadlockFree(B) )  IMPLIES  DeadlockFree(Compose(A,B))

is not a rule  or even a guideline leads to serious problems [VENKMAN].   In fact, this model of programming is terrible – there is a reason why Occam never caught on and real “threads” there are timeouts available for practical message passing systems, message brokers are popular, and “select” is a standard part of most network control loops.  QNX which is an interesting operating system designed around synchronous message passing implements asynchronous signaling so that message passing systems can be designed without loops: threads should send synchronous messages only “up” and use asynchronous signals to communicate “down”.

But if synchronous message passing is primitive, timeouts or asynchronous signals are hard to model.  The structure of synchronization and information flow in complex computer programs is tough issue and one on which there has been a good bit of interesting research recently. From lock-free data structures to data flow processing to NUMA memory, there are all sorts of interesting ways of sharing data that are, at best, awkward to discuss if synchronous message exchange is the only means of communication.

Semantics

“What exactly is meant by the behavior of nondeterministic or concurrent programs is far from clear”[HENNESSY]

The discussion in [HENNESSY] is fascinating in that it reveals, among other things,  that the topic of whether observers can see “intermediate” states is one that needs investigation 20 years into the research program. State machines, on the other hand, have very simple ideas of state that do not depend on the mentality of the observer or the weather. In fact, the question [HENNESSY] addresses was solved by Moore who provided an output map from states to outputs to distinguish clearly between the visible state (output) and the internal state. Myhill and Nerode then showed that these distinctions are robust in the sense that if you treat a Moore machine as a map from input sequences to output values, a left congruence recovers the state set up to homomorphism and a full congruence similarly recovers the underlying monoid. So it’s worth looking at how “communicating processes” could be modelled with this more restrictive and more sharply defined structure.

A deterministic Moore machine can be given as

M = (A, S, st, D:SxA TO S, G:STO X)

A is the alphabet set, S is a state set,  st in S is the start state,  D is the transition function and G is the output function.

Let’s define a class of “process” state machines that have A = { tau, in(c), out(c): c in C} for some set C of “channels”. The idea here is that tau represents an internal state change with no i/o. The outputs are subsets of A where each output x subset A defines the set of possible events for the state machine in the next step.  Then let’s define a slightly more sophisticated version of Hartmanis’s product like this

M = PROCESS[ M1 ... Mn, N ]

where each Mi is a “process” state machine, N is an “operating system” machine to be defined. The state of M are the state tuples (s1 … sn, q) where each si is an element of the state set of Mi and q is an element of the state set of N.  The input alphabet to N is the set of  (n+1) tuples (x1, … xn, a) where each xi is in X (the output set of the process machines) and “a” is in A. The outputs of N are maps phi1 … phik where each phij:Xn cross A cross {1 … n} TO   A UNION {null} and the idea is that phij(x1 … ,xn,a,i) is the input to be generated for machine Mi when the outputs of the factor machines are as given and the input to the composite system is “a”. If phij(x1 … ,xn,a,i)=null then Mi should not move. More formally if the state is s=(s1 … sn, q) then D(s,a) = (s1′ … sn’, q’) where  q’ the next state of N computed from “q” and from (x1 … xn, a) by the transition map of N, and each si’ is the next state of Mi  either si=si’ when phij(x1 … ,xn,a,i)=null and si’ is computed from si with input  phij(x1 … ,xn,a,i) otherwise.  Our rule is that N must compute only inputs that the component machines will accept (as determined by xi) and N must produce input in(c) for component k only if either the “a” is “in(c)” and no other “process” machine advances or there is some i so “out(c)” can be generated for “i“.  This could all be done more simply using the methods decscribed [RECURSION]

We got rid of all of the “non-determinism” by underspecifying N which decides on which process to run next based on some algorithm we don’t say anything about.  Process machines might “accept” identical languages, yet differ in what inputs they enabmilner 001le in which states. Perhaps there is something one can express in process algebra which cannot be accomodated by this model and maybe such things would be interesting in the context of  program verification. It would be interesting to find out. But what has been shown is that the basic idea of the process algebra is relatively easy to represent with automata products – using methods known for a half century or more. And without requiring anywhere near the volume of notation and axioms that seem to be essential parts of process algebras or losing our bearings as to what is state.

References

[MILNER] A calculus of communicating systems, Robin Milner. Springer (LNCS 92), 1980. ISBN 3-540-10235-3

[MILNER2] The Polyadic pi-Calculus: a Tutorial (1991)

[RABINSCOTT] M. O. Rabin and D. Scott, “Finite Automata and their Decision Problems”, IBM Journal of Research and Development, 3:2 (1959) pp.115-125.

[HARTMANIS] Hartmanis, J.; and Stearns, R. E. (1966), Algebraic Structure Theory of Sequential Machines (Prentice Hall)

[LOOP] Juris Hartmanis: Loop-Free Structure of Sequential Machines Information and Control 5(1): 25-43 (1962)

[HALGEBRA] Hoare, C. A. 1993. Algebra and models. In Proceedings of the 1st ACM SIGSOFT Symposium on Foundations of Software Engineering (Los Angeles, California, United States, December 08 – 10, 1993). D. Notkin, Ed. SIGSOFT ’93. ACM, New York, NY, 1-8. DOI= http://doi.acm.org/10.1145/256428.167053

[HENNESSY] Hennessy, M. and Milner, R. 1985. Algebraic laws for nondeterminism and concurrency. J. ACM 32, 1 (Jan. 1985), 137-161. DOI= http://doi.acm.org/10.1145/2455.2460

[BAETEN] Baeten, J. C. 2005. A brief history of process algebra. Theor. Comput. Sci. 335, 2-3 (May. 2005), 131-146. DOI= http://dx.doi.org/10.1016/j.tcs.2004.07.036

[RECURSION] Yodaiken, 2009, Primitive Recursive Transducers. http://yodaiken.com/papers/recursivetransducer.pdf

[VENK] Dr. Peter Venkmann, Columbia University.  Personal communication.

A calculus of communicating systems, Robin Milner. Springer (LNCS 92), 1980. ISBN 3-540-10235-3

2 Responses to process algebras

  1. Pingback: The source of error at keeping simple

  2. Pingback: Process algebra at keeping simple

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>