# Failure to communicate

From the point of view of someone who wants to understand programs and digital devices, the “process algebra” line of research typified by Milner’s CCS [MILNER] begins with 3 dubious assumptions.

1.  Programs and digital devices are non-deterministic  or at least should be thought of that way,
2. Processes communicating by synchronous message exchange are fundamental “primitive” building blocks.
3. Classical automata theory is too limited to express the semantics of programs.

## Nondeterminism

Here we are not talking about “non-deterministic algorithms” which are mathematical functions that “guess” at a solution.  In practice, a program implementing such an algorithm executes deterministically. For example the program:

x= Get_Random(); SearchFrom(x);

may get “x” as an input or may compute it as a pseudo-random variable, but starting from the same state, receiving the same input it will do exactly the same computation.  What the “process algebra” researchers mean by “non-deterministic” is that the next state of a system is not a function (or even a probabilistic function) of the current state and the current input, but may depend on something else.

Milner asks us to suppose we have two devices A and B, so that  when powered up A has a possible “a” action followed by a choice of a “b” or “c” action. The B device starts the same, but after the “a” action the  “b” action may or may not become impossible.  That is, B can accept an “a” action and then either enable both “b” and “c” actions or choose to enable only “c” actions.

We do not know what ‘determines’ the outcome for [B], perhaps it is the weather” [MILNER]

Consider a plain old deterministic state machine as map

f: E* → X

where E is the set of finite sequences over a set E called an “alphabet” of events and X is the set of outputs of the system.  For some s in E, say f(s) is “the output of the system represented by f in the state reached by the sequence of events s“. The empty sequence, of course, leaves the system in the initial state so f(empty) is the initial state output.  How can we accommodate Milner’s weather in such a model? One method is to use WE* in place of E* where W is an alphabet of weather observations. Then for weather observations A and B let f(As) = fA(s) and f(Bs) = fB(s) .

We can say W consists of all 1010000 bit encodings of a simulation of Earth weather or even make W infinite or just leave it unspecified except for the properties asserted.   This relocates the indeterminism to the environment and, to me, seems to make the problem statement more faithful to how digital devices work. After all, computer scientists have a hard enough time understanding the behavior of a few transistors, we don’t need to model the weather as well. The same procedure can be used for the “delays”  that Hoare argues indicate non-determinism.

The only solution is to allow the existence and duration of the delay to influence the course of the computation. For example the programmer can use the [ combinator of CSP to allow the earliest possible selection of the first possible event that can occur. Since delays are essentially unpredictable, we are almost forced to accept non-determinism as an inherent property of programs and algorithms, and even of computer hardware.[HOARE]

Those delays may be unpredictable, but how does that make the programs “non-deterministic”? A program does a “load x” operation and depending on caches, memory management, disk operation, bus width, contention, error correcting codes and so on, many things can happen and they may take some time. But given the same response by the OS and hardware, the program will always do the same thing.  Obviously, we have a choice between saying P is deterministic with input that e.g. can vary by system timing or saying P contains fundamental non-determinism.  And this is true as well for  multi-threaded programs.

In one sense, there’s no argument: if we run a multithreaded program twice we may get different results. That non-determinism results from the OS selection of which process to run, the contents of cache, the level of network congestion and so on.  But is the program non-deterministic or does its environment provide it with inputs that can vary in time and order? If we have a program P made up of multiple threads and contents of all registers and memory is given by S, then the event of the operating system choosing to run the next instruction in some thread T causes a completely determined transition to the next state that only depends on S. In fact, the kinds of things we want to prove about multi-threaded programs are often assertions about what must or must not happen, given various possibilities about which externally generated events will happen next.

Coming back to Milner’s A and B, a program is not going to make a non-deterministic choice about whether to enable the “c” action, the program will read a sensor or examine a memory location or in some other way get an input that will then lead to a completely deterministic choice.  When we use Kirchoff’s current laws to analyze a circuit containing switches, we do not need to incorporate the “non-deterministic” setting of the switches in our circuit model. Similarly, when we evaluate the behavior of a multi-threaded web server, we do not need to incorporate the “non-deterministic” operation of networks in our web server model.

The remaining argument for including non-determinism in the mathematics we use to model programs is purely pragmatic and is based on supposed mathematical advantages and increased abstraction.  And this brings us to the question of automata.

## Automata

I’m not going to  argue that just because something can be expressed in terms of classical automata, it should be expressed in that way,  but the process algebra literature is replete with dubious arguments about what cannot be expressed in terms of classical automata. For the A/B example, Milner writes, that while both machines accept the same regular language  “a(b+c)“, they act differently in a way that can be observed by an experimenter.

### The language recognized by a state machine is not the state machine

Now in standard automata theory, an automaton is interpreted as a language,” Milner writes “i.e. as a set of strings over the alphabet.” That’s not at all correct. There is a whole field of state machine minimization in Electrical Engineering concerning the differences between automata that accept the same language – and Moore and Mealy machines are also important in automata theory. There is an entire algebraic theory of automata concerning the factorization of state machines that has nothing to do with interpreting a state machinee as a language. Standard automata theory does not “interpret” state machines  “as a language”.  There is a theorem that the class of languages recognized by finite state binary output deterministic state machines is the same as the class of languages recognized by finite state non-deterministic state machines but Milner’s characterization of that theorem is absurd.  And even ignoring that, Milner’s argument is based on a simple error.

Consider two state machines A and B with an alphabet of events  {a,b,c,d}  and A has states {A,A1,A2, A3} and B has states {B,B’,B1,B2,B3}. The state machine transitions can be given by ordered triplets (state1,input, state2) that show the input label  on a directed edge between state1 and state2.  For Milner’s example:

state machine A has transitions { (A,a,A1), (A1,b,A2), (A1,c,A3), (A3,d,A) },

state machine B has transitions: { (B,a,B1) (B,a,B’), (B1,b, B2), (B’,c,B3), (B3,d,B)}.

B is non-deterministic because there are 2 “a” transitions from state B. Milner points out that if we consider A2 and B2 to be accept states, both machines accept the same language (acd)*ab. So far so good. At this point Milner asks us to think of {a,b,c,d} as “ports” or maybe buttons that can be pushed. The button “is unlocked if the agent can perform the associated action and can be depressed to make it do the action, otherwise the button is locked and cannot be depressed.”  Then: “after the a-button is depressed a difference emerges between A and B. For A – which is deterministic – b and c will be unlocked, while for B – which is non-deterministic – sometimes only b will be unlocked and sometimes only c will be unlocked.” This is a basic error involving a change of conditions. Milner has begun the discussion with binary valued state machines that accept or reject words over some alphabet, but then shifts to state machines that output values telling us which buttons are “unlocked”.

An experimenter or external agent trying to push these buttons will discover a difference between the two machines eventually because some times after an initial “a” input on the second state machine a “b” is possible and sometimes not, although on the first state machine after an “a” the “b” is always possible.  But how does the external agent determine that B will not perform a “b” action sometimes? The external agent “attempt[s] to depress b” and fails – the locked/unlocked state of each button is visible to the external agent. So Milner has changed definitions in the middle of the argument.  At the start, finite state machines were language recognizers with, as the classical text on automata theory explains: “output limited to a binary signal: ‘accept/don’t accept’ ” [Hopcroft and Ullman].  Those automata will not tell us  anything else about a word other than that binary condition – is it in the language or not.  But Milner’s button state machines tell us also what buttons are locked and what are unlocked in the terminal state reached by the word.  So Milner’s state machines distinguish words that a recognizer state machine does not.

State machines that have outputs beyond accept/reject (sometimes called “transducers”) are essentially descriptions of maps from input strings to output strings or from input strings to output value in the terminal state. Standard automata theory would say Milner’s two machines accept the same language of strings, but produce different languages of strings as outputs.

Standard automata theory, as far as I know, has never really considered the case of non-deterministic Moore machines but the extension is trivial. Milner’s transition systems are just labeled directed graphs with a root vertex. Consider a labeled directed graph G with labels E, a distinguished root vertex (start state)  s0, the set of triples R= { (s1,a,s2) if there is an edge labeled a from s1 to s2}. The set of vertices V is the set of states. We can define a relation R* subset E* x V so that R* is the smallest set containing only (null,s0) and (wa,s’) whenever (w,s) is in R* and (s,a,s’) is in R – where wa is the string obtained by appending “a” to “w” on the right.  For any vertex “s” define Q(s) to be the subset of A containing every “a” so that (s,a,s’) is in R for some s’ in V. Then let G* be the set of pairs (w,Q(s)) for every (w,s) in R*.  As far as I can tell on a quick look, Milner’s bisimulation between G1 and G2 is simply equality of G1* and G2*.

Milner writes: “in standard automata theory, an automaton is interpreted as a language”, and so he argues that  the distinction he wants to draw between A and B devices cannot be expressed. But  “standard automata” theory in no way requires that machines that accept the same language be treated as identical – an automaton is  not “interpreted as a language” even though some automata may be used to recognize languages. Similarly , two resistors in series can be understood as a single resistance on the line, but they are not “interpreted” as a single resistor in a way that makes it impossible to distinguish two resistors from one.  It is true that if we convert a non-deterministic B machine to a deterministic one, we  produce a machine that recognizes the same language but behaves differently: the B machines can enable two events and then change state to disable one of them. The theorem [RABINSCOTT] just says the converted deterministic machine and the original non-deterministic machine recognize the same language. That is, L(M1) = L(M2) does not mean it is impossible to distinguish M1 and M2 some other way.  For example, it’s easy enough to define Reject(M) for a state machine as the language of strings that can be rejected by M. If M is deterministic then L(M) = E* – Reject(M). But if M is non-deterministic, there may be strings in both sets.  Automata theory allows us to preserve this difference should we find it significant.

### State machines represent composition and communication in terms of automata products

Another common argument is that automata cannot express interaction. Here’s a relatively complete argument.

The simplest model of behaviour is to see behaviour as an input/output function. A value or input is given at the beginning of the process, and at some moment there is a(nother) value as outcome or output. This model was used to advantage as the simplest model of the behaviour of a computer program in computer science, from the start of the subject in the middle of the twentieth century. It was instrumental in the development of (finite state) automata theory. In automata theory, a process is modeled as an automaton. An automaton has a number of states and a number of transitions, going from one state to a(nother) state. A transition denotes the execution of an (elementary) action, the basic unit of behaviour. Besides, there is an initial state (sometimes, more than one) and a number of final states.A behaviour is a run, i.e. a path from initial state to final state. Important from the beginning is when to consider two automata equal, expressed by a notion of equivalence.

On automata, the basic notion of equivalence is language equivalence: a behaviour is characterized by the set of executions from the initial state to a final state. An algebra that allows equational reasoning about automata is the algebra of regular expressions (see e.g. [58]). Later on, this model was found to be lacking in several situations. Basically, what is missing is the notion of interaction: during the execution from initial state to final state, a system may interact with another system. This is needed in order to describe parallel or distributed systems, or so-called reactive systems. When dealing with interacting systems, we say we are doing concurrency theory, so concurrency theory is the theory of interacting, parallel and/or distributed systems.[BAETTEN]

We see the same argument about language equivalence discussed above and a second argument about interaction that is actually more difficult to understand. Here is a product of automata in which each automaton can communicate with (interact with) each other automaton from a paper published in 1962.

Definition 1. Let {M1, M2.. ., Mn} be a set of Moore type machines in which the outputs of any machine Mi i = 1, 2,…, n, may be used as inputs to other machines. We shall say that this set of interconnected machines is concurrently operating if the next state (state at time t + 1) of each machine Mi depends on the present state of Mi, the present outputs (which depend only on the present states) of the machines to which it is connected, and the present external input. The ordered n-tuple (or “configuration”) of the present states of the machines M1 M2 . . ., Mn will be referred to as the state of the interconnected machine.[LOOP]

This product is certainly flexible enough to describe any type of communication. In fact, it can easily accommodate the type of synchronous message passing considered fundamental in the process algebras. Suppose we have Moore type state machines that output requests to send and receive messages on channels. Put n of them together in a product of the type Hartmanis defines and add another state machine to act as a scheduler. The input to the scheduler is the set of requests of the other machines. The output of the scheduler is a prioritization of the other factor machines. The highest priority machine then can act by either using the external input or the output of a communicating factor machine – which will also make a transition. I’ll return to this below, but first let’s look at the communicating process idea a little more.

## Message passing processes

The  assumption  that is useful to treat  “processes communicating via synchronous message transfer” as building blocks  seems false.  Generations of programmers have repeatedly rediscovered that threads using only synchronous message passing are excellent building blocks for  creating deadlock.  Suppose process P does

loop: Get x from channel 1; Send y to channel 2; send  to channel 3

and process Q does

loop: Get x from channel 2; Send y to channel 1; send z to channel 4

In isolation the two processes are deadlock free. But when you connect them they lock up immediately. When processes are non-trivial, the failures are just more elusive. In order to preserve deadlock freedom under composition for such components you need a rule that components can be ordered in such a way that each component only “sends” to components of higher order or to the external network. Otherwise cycles of waiting processes can appear. Beginning an investigation of  composite communicating systems using components where

is not a rule  or even a guideline leads to serious problems [VENKMAN].   In fact, this model of programming is terrible – there  are reasons why Occam never caught on, event programming is popular, there are timeouts available for practical message passing systems, message brokers are popular, and “select” is a standard part of most network control loops.  QNX which is an interesting operating system designed around synchronous message passing implements asynchronous signaling so that message passing systems can be designed without loops: threads should send synchronous messages only “up” and use asynchronous signals to communicate “down”.

But if synchronous message passing is primitive, timeouts or asynchronous signals are hard to model.  The structure of synchronization and information flow in complex computer programs is tough issue and one on which there has been a good bit of interesting research recently. From lock-free data structures to data flow processing to NUMA memory, there are all sorts of interesting ways of sharing data that are, at best, awkward to discuss if synchronous message exchange is the only means of communication.

## Semantics

“What exactly is meant by the behavior of nondeterministic or concurrent programs is far from clear”[HENNESSY]

The discussion in [HENNESSY] is fascinating in that it reveals, among other things,  that the topic of whether observers can see “intermediate” states is one that needs investigation 20 years into the research program. State machines, on the other hand, have very simple ideas of state that do not depend on the mentality of the observer or the weather. In fact, the question [HENNESSY] addresses was solved by Moore who provided an output map from states to outputs to distinguish clearly between the visible state (output) and the internal state. Myhill and Nerode then showed that these distinctions are robust in the sense that if you treat a Moore machine as a map from input sequences to output values, a left congruence recovers the state set up to homomorphism and a full congruence similarly recovers the underlying monoid. So it’s worth looking at how “communicating processes” could be modeled with this more restrictive and more sharply defined structure.

A deterministic Moore machine can be given as

M = (E, S, st, X, D:SxE TO S, G:S TO X)

A is the alphabet set, S is a state set,  st in S is the start state,  X is the set of outputs, D is the transition function and G is the output function.

Now define a class of “process” state machines that act like the communicating process machines of process algebra. Say M is a C,V  process machine where C is a set of “channels” and V is a set of values if the input alphabet of C includes for each c in C some  elements in[c,v]  or out[c,v] for v in V.   So a process state machine can have a local transition or it can receive or send a value on a channel. The output of a process state machine will be a subset of the input alphabet with the property that a in G(s) if and only if D(s,a) is defined. That is, the outputs define the possible next operations.

Then let’s define a slightly more sophisticated version of Hartmanis’ product like this

M = PROCESS[ M1 … Mn, N ]

where each Mi is a “process” state machine and N is an “operating system” machine to be defined. Say c “matches” in M if and only if in[c,v] is in the input alphabet of some Mi and out[c,v] is in the input alphabet of some Mj j != i. That is, a channel c matches if and only if it is shared between at least two component state machines. M is a process state machine, with inputs { {tau} union B union Z}  where B is the union of the local operations of the Mi, and Z is the set of unmatchable i/o operations: in[c,v] and out[c,v]. The idea is that in/out operations that match can be hidden with the constructed state machine.  The  tau represents an internal i/o operation.

The states of M are the state tuples (s1 … sn, q) where each si is an element of the state set of Mi and q is an element of the state set of N. The machine N is supposed to determine which Mi gets to advance next from the input to the composite process machine and the outputs of the components. There are a number of ways N could work, but let’s go for simplicity here and have N output a priority list, an ordering from 1 .. n with the intuition that an input a to M goes to the highest priority element of M1 .. Mn that can accept a. That is, if a=in_c or out_c, then the highest priority Mi as determined by the current output of N will get input as some local operation “a != tau”, then the highest priority Mi gets “a”. For input  “a=tau” then the highest priority Mi that has a possible i/o operation that is matched in some distinct  Mj  gets to advance by that i/o operation concurrently with the matching   The output of M is constructed by taking the union of the outputs of each Mi, removing tau and every in[c,v] or out[c,v] where c is matched in M and then adding tau back if and only if there is at least one pair {in(c,v), out(c,v)} in the output sets of some pair Mi, Mj.  The other machines get no input at all.  This could all be done more simply using the methods decscribed [RECURSION]

We got rid of all of the “non-determinism” by underspecifying N which decides on which process to run next based on some algorithm we don’t say anything about.  Process machines might “accept” identical languages, yet differ in what inputs they enable in which states. Perhaps there is something one can express in process algebra which cannot be accommodated by this model and maybe such things would be interesting in the context of  program verification. It would be interesting to find out. But what has been shown is that the basic idea of the process algebra is relatively easy to represent with automata products – using methods known for a half century or more. And without requiring anywhere near the volume of notation and axioms that seem to be essential parts of process algebras or losing our bearings as to what is state.

## References

[MILNER] A calculus of communicating systems, Robin Milner. Springer (LNCS 92), 1980. ISBN 3-540-10235-3

[MILNER2] The Polyadic pi-Calculus: a Tutorial (1991)

[RABINSCOTT] M. O. Rabin and D. Scott, “Finite Automata and their Decision Problems”, IBM Journal of Research and Development, 3:2 (1959) pp.115-125.

[HARTMANIS] Hartmanis, J.; and Stearns, R. E. (1966), Algebraic Structure Theory of Sequential Machines (Prentice Hall)

[LOOP] Juris Hartmanis: Loop-Free Structure of Sequential Machines Information and Control 5(1): 25-43 (1962)

[HALGEBRA] Hoare, C. A. 1993. Algebra and models. In Proceedings of the 1st ACM SIGSOFT Symposium on Foundations of Software Engineering (Los Angeles, California, United States, December 08 – 10, 1993). D. Notkin, Ed. SIGSOFT ’93. ACM, New York, NY, 1-8. DOI= http://doi.acm.org/10.1145/256428.167053

[HENNESSY] Hennessy, M. and Milner, R. 1985. Algebraic laws for nondeterminism and concurrency. J. ACM 32, 1 (Jan. 1985), 137-161. DOI= http://doi.acm.org/10.1145/2455.2460

[BAETEN] Baeten, J. C. 2005. A brief history of process algebra. Theor. Comput. Sci. 335, 2-3 (May. 2005), 131-146. DOI= http://dx.doi.org/10.1016/j.tcs.2004.07.036

[RECURSION] Yodaiken, 2009, Primitive Recursive Transducers. http://yodaiken.com/papers/recursivetransducer.pdf

[VENK] Dr. Peter Venkmann, Columbia University.  Personal communication.

Edited and updated 1/19/2014

A calculus of communicating systems, Robin Milner. Springer (LNCS 92), 1980. ISBN 3-540-10235-3