# Recursion and state

## Rescuing Automata Theory from the Attic [this is an active draft and under  revision. ]

Presentations of automata paper

Circuits paper

Process Algebra paper

My interest is in engineering programs and computational devices – where the enormous state sets and complex state transitions, not to mention complex interactions between components is the source of much error and conceptual confusion. The kinds of problems I want to address are what properties a scheduler in an OS needs to be efficient and meet strict timing requirements in an OS, given a certain paging regime, or whether a bus signaling protocol works, or whether a fault recovery algorithm for a distributed database is reliable. I’m going to show in this note how to use plain old state machines on these types of questions. Despite some deep results, automata theory has fallen out of favor in computer science as a technique for such work[3]. The crippling drawback to state machines is that traditional state machine presentations in terms of state sets and transition maps are awkward, don’t scale,  and carry too much irrelevant information. But there is a beautiful and concise approach to state machines in terms of ordinary algebra, recursion, and strings of inputs. This approach also gives us a powerful technique for describing how components can be connected and what parallel state change does.  And unlike “formal methods” and the various extensions of state machines, “classical automata” are mathematically coherent objects, strongly connected to semigroup and then group theory in mathematics[2].

Start with basics, continue with composition/parallelism, and some algebra, and then some larger examples.

Details are in  this paper with circuit examples here.

## Basics

The basic idea is to think of state machines as definitions of function on strings where the string describes the path from the initial state. So “f(w)=x” means that the output in the state reached by following string “w” from the initial state is the value “x”. This is a really simple and obvious idea, but it takes a little work to see why it’s a useful idea. Strangely, the practice of using these functions to describe state machines is almost unknown. In fact, I don’t see any reference to it at all in the computer science literature – although I’d be happy to learn otherwise. The reason this is surprising is that it’s a kind of obvious method with a big practical payoff I’ll get to below [1]. One trick that makes the method more useful that nobody seems to have stumbled  on is a recursive composition that can be used to glue state machines together – and more on that below too.

Let’s get some string notation out of the way – sorry, but just a paragraph and no strange notation is needed. Given an alphabet of events A, the set A* contains all finite strings over A including the empty string which, for convenience, we can call NULL. As usual, if “w” and “u” are strings, “wu” is the string obtained by concatenating “u” on to the right end of “w” so that if “w=<a1 …, an>” and “u=<b1 …,bm>” then “wu=<a1 …,an,b1 …,bm>”.  Because I’m a lazy typist, I’ll generally write single element sequences without the angle brackets, so “a” can either be a single element of A or an abbreviation for “<a>” and then “wa” abbreviates “w<a>”.  If there is a chance of confusion, I’ll state whether “a” the string or “a” the element is being referenced. Now we are ready to be able to define functions on strings using primitive recursion.

You are probably more familiar with primitive recursion on numbers: the primitive recursive definition of factorial is a cliche of early programming courses “f(0)=1″ and “f(n+1)=f(n)×n” completely defines “f(x)”. With strings, appending an element of the alphabet to a string is similar to “+1″ and “NULL” is the zero. Consider the following definition of a function “f” that has a single string  as a parameter

f(NULL) = c

f(wa)= h(a,f(w))

Notice that “f” is now defined on any string. For example:

f(<a1,a2,a3>)=h(a3,f(<a1,a2>))= h(a3,h(a2,f(a1))) = h(a3,h(a2,h(a1,c)))

As a trivial example “Length(NULL)=0″ and “Length(wa)= 1+ Length(w)” is a useful function.

A state machine can be described as a tuple  (A,S,start, D, G) where A is the set of inputs or events sometimes called the “alphabet”, S is the set of states, “start” is an element of  “S” that serves as the initial state, “D” is a function from states and events to states so that “D(s,a)=s’ ” defines the transition rules, and “G(s)” is the output in state “s”. This is a Moore machine or transducer. I’m not necessarily requiring that A or S be finite  although, obviously, actual computing systems and programs are finite state so that’s a very important subclass. Note that “Length” does not describe a finite state machine since it needs one state for every non-negative integer.

Just for clarity, some computer science researchers think all state machines as language “acceptors”, but I’m not particularly interested in language recognition here. I’m using state machines as electrical engineers and programmers do to model the operation of discrete state devices (software and hardware).

The relationship between state machines and primitive recursive string functions is embarrassingly obvious. If M=(A,S,start,D,G) then define  D*  (pronounced “D star”) like this:

D*(s,NULL)=s

D*(s,wa)=D(D*(s,w),a)

PDP 8 from 1965

Say “f represents M” if and only if “f(w)= G(D*(start,w))” for every “w” in A*. So if we have M, we can construct a primitive recursive “f” that represents “M”. And it’s not difficult to show that every “f:A* -> X” represents some “M”, but I’ll postpone that sort of obvious construction to the algebra section. Even with this first step, we get rid of a lot of annoying limitations of the set-tuple description of state machines. First, we can ignore differences between “state machines”that consist only of names of states, unreachable states, or states that can be minimized away. We may have M and M’ that have totally different state sets, but with the same representing function. But names of states are essentially artifacts of the way we are representing the state machine as a tuple – they are properties of the implementation of the state machine which we can ignore if we are only interested in behavior.  Early research into algebraic automata was started before state set minimization became a reasonably solved problem and back in the days of discrete logic when saving a couple of extra states was a big deal. State machine minimization is an important problem – but it’s not the problem we care about when we want to know if a bus architecture works or if an operating system schedules correctly or if a distributed database query terminates in a timely manner.  And the representing function doesn’t care about the names of states, about possible unreachable states or about duplicative states that could be removed by minimization. Even better the representing function can be parameterized:

f(NULL,k) = 0

f(wa,k) = (f(w)+1 )mod k

Our function “f” represents a family of state machines indexed by “k”. Let’s now suppose the alphabet is {1,-1, reset, idle} and let’s define “f2″

f2(NULL,k) = NULL

• f2(wa,k) =
• 0 IF a=reset
• ELSE f2(w,k) IF a=idle
• ELSE (f2(w,k)+1) mod k IF a=1
• ELSE (f(2w,k)-1) IF a=-1 AND f2(w,k) >0
• f2(w,k) OTHERWISE

Note that “f2″ is starting to represent some state machines that would be individually painful to define in terms of state sets and transition functions. We can think of “f2″ as a specification of a representing function and making “k” a constant reduces “f2″ to a representing function.

To raise the bar, let’s start with one way of adding timing to the system. When we are working with circuits or close to the hardware , I have found it useful to use inputs that correspond to discrete samples of the signals being applied to the input pins (see the circuits paper ). But sometimes it’s useful to introduce a “tick” input and treat the other inputs as instantaneous which is what I’ll do here. Let’s define a “f3″ with an alphabet that includes the alphabet of “f2″ and an additional “tick” input. Now let’s say: “f3″ is a “k counter” with timing delay “t”  only if for all “w”, “f3(w) ” has a value in the set {0 …, k-1, BUSY} and that when “f3(w)=BUSY” it means that it is working and otherwise it outputs the correct number for counting mod k and, finally, that “f3″ doesn’t take more than “t” ticks to finish an computation.  To make that exact, we need some “helper” state machines. First, we want to be sure that “f3″ is not asked to work too fast: that when it is busy, it only gets “tick” inputs. Define “InOk” by

InOk(NULL)=1

InOk(wa)=0  IF “a” is not equal to “tick” AND  “f3(w)=BUSY”

InOk(wa) = InOK(w) OTHERWISE.

Using “!=” to mean “NOT EQUAL”  our requirement is that “IF InOk(w)>0 AND f3(w) != BUSY THEN f3(w)=f2(w,k)“. Note that “f3″ is being defined in terms of “f2″ and “InOk” which are themselves state machines.

But what keeps “f3″ from never completing a computation? Aha!

Busytime(NULL) = 0

Busytime(wa) = Busytime(w) +1 IF f3(wa) = BUSY

Busytime(wa) = 0 OTHERWISE

Now we can say that “f3″ is “t fast” if and only if “InOk(w)=1″ implies that “Busytime(w) < t”.

So “f3″ is a solution to a specification of a class, an infinite class of state machines. If “f3″ is a mod k timer with delay t, then it represents some machine in that class, but we almost certainly don’t care which one.

By adopting the primitive recursive string function representation, we can focus on the behavior and not on the naming of state set elements or anything similarly uninteresting.  In general, we do not want to actually specify the details of a state machine, we just want to identify some properties that it must have and see if they are sufficient to assure other properties we would like to follow. Here’s we are stealing an idea from the engineers of physical systems  who want solutions to differential equations, but discard uniqueness, constants, factors, and anything else not relevant for their needs to the empirical characteristics of the phenomenon they are studying

## Composition and Parallel Computation

This is an area that the process algebra people get dead wrong (see ).

Suppose we say V is a store over set X if and only if the alphabet of V is X and “V(NULL)=c” for some co

nstant “c”, and “V(wx)=x” for any “x in X”. Now suppose V1 … Vn are stores over X. Suppose we take the “n” stores and encapsulate them inside a new device so that when input “x” is applied to this new device, “x” is stored in the first en

capsulated store and the output of the previous encapsulated store is used as the input to each other encapsulated store. Define

shift(a,1,(x1 …,xn)) = a

shift(a,i+1,(x1 …,xn)) = xi

Now make define  the functions “B” and “shift*” as follows:

B(w) = (V1(w1) …. Vn(wn))

shift*(w,i)=wi

shift*(NULL,i)=NULL

shift*(wa,i) = ua where u=shift*(w,i)  and a=shift(a,i, B(w))

What’s going on here is that we are adding another layer of primitive recursion: the input sequence “wi” that goes to encapsulated components “Vi” is a primitive recursive function of “B” and “shift”. Let’s work out “B(w)” for some simple cases . When the input sequence is NULL, we have “g*(i,NULL,(… ))=NULL” so, “B(NULL) = (V1(NULL) …, Vn(NULL))=(c … ,c)”.  Now consider “<a>”.  Note that “shift(a,1,(x1 … xn))=a” – it does not depend on the argument (x1, … ,xn) at all.  So, “shift*(<a>,1) = a”. And “shift*(ab,1)=ab” – in fact, “shift*(w,1)=w”.  What is “B(<a>)”? It is, by definition “(V1(g*(<a>,1)) … Vn(g*(<a>,n)))” which we can simplify to “(V1(a), … Vn(g*(a,n))” which is “(a …, Vn(g*(a,n))”. But  for “j>1″, “shift*(ua,j)= shift*(u,j) concatenated with shift(a,j,B(u))” and to evaluate  “Vj(zb)” we only need to know “b”. Since we know “B(NULL)” then we know “shift(a,j,B(NULL)) = <c>”. So “B(a)=(a, c … ,c)” – which allows us to compute “B(<a,b>)”.

Ok, that was pretty simple although please note that an input causes all “n” components to change state in parallel. Now let’s add a “rotate” to the alphabet, making sure that “rotate” is not already an element of “X”. Modify “shift” by changing “shift(a,1,(x1 …,xn)) = xn” if “a=rotate” and now we have made a nice device. Let’s add another input “reset[x]” for each element of “X”  and for all “i” including “i=1″ make “shift(reset[c],i,(x1 …,xn))=c”.

This type of definition also has a corresponding interpretation in terms of tuples-of-sets-and-functions state machines. The recursive composition (this is actually simultaneous recursion which can be shown not to extend the class of definable functions) corresponds to a general automata product. Take the sets corresponding to V1 … Vn and form the set of n-tuples of the states of those machines (s1 …, sn) where each si is an element of the state set of machine i. Then define the transition so that given s=(s1 …,sn) we can define “G(s)= (G1(s1) …., Gn(sn))”  and “D(a, s) = (s1′ … sn’)” where “si = Di(si,g(a,i,G(s))”.

More formally, we have definitions where F and phi* are defined from f1 …. , fn and phi as follows:

Given f1 … fn, each fi: Ai* to Xi, a set A, and a function phi so that phi(a,i,(x1 … ,xn)) in Ai* define

F(w) = (f1(w1) …,fn(wn))

wi = phi*(w,i) where

phi*(NULL,i)= NULL

phi*(wa,i) = phi*(w,i) concatenated with phi(a,i,F(w))

## Monoids and groups

Some of this can be found here - more details to come

There’s a correspondence between the notions of “combinational” and “sequential” in digital circuit engineering and some structure in state machines (and therefore monoids)  that seems interesting.

In digital logic, a “combinational” circuit like a logic gate has can be associated with a time T so that the output depends only on input signals applied over the last T time units. If the propagation delay is T, then signals applied T+n time units ago are irrelevant to current state.  For example, an AND gate with propagation delay T has output 1 if all inputs have been set HIGH over the last T time units. For a sequential circuit, like a latch, there is no magic time T. Suppose we set the stored value in the  latch at time T1 and then signal it to hold the value. No matter how long the latch is held, we have to go back to absolute time T1 to find out what is in the latch. It’s worth noting however, that it’s often very useful to make a sequential circuit have a reset mode where it behaves like a combinatorial circuit. That is, we want to be able to say: no matter what previous signals have done to a memory device, some sequence of signals will clear it and, for example,  put the number 3 in it. More on this below.

Take a Moore machine M and write State(M,w) for the state reached by following input string “w” from the initial state. I’m limiting us to complete Moore machines where State(M,w) must be defined for every w.  Say M is combinational if there is some T so that if w=uz and length(z)>T then State(M,w)= State(M,z) . So the idea is that there is a T so that the last T inputs determine state for M. Call T the “depth” of M. As an example, suppose M represents a memory location and the inputs are just data values to store. Then we only need the last input, T=1.  If M is an AND gate with two pins and a delay of 2, then we can think of inputs as numbers 0,1,2,3 representing their 2 bit encodings: 0 for 00, 1 for 01, 2 for 10 and 3 for 11.  States can be pairs [x1,x2] representing the last two inputs so that if the state is [a,b] and input “c” is received, the new state is [c,b].  The state is actually a shift register. Suppose that the states are all pairs [x1,x2] where x1 and x2 are in the set {-1,0,1,2,3} with “-1″ representing  an unknown signal. Then [3,3]  has output 1, [x1,x2] where both x1 and x2 are greater than -1 and less than 3 outputs 0 and the others have undefined output. The initial state is [-1,-1]. Obviously this machine is combinational with depth T=2. The intuition here is that combinatorial state machines will always “swim” to a fixed state given a fixed sequence of inputs of sufficient length. Obviously, this is a useful property of a device and even for sequential devices we are going to want some single or sequence of “reset” operations that, no matter what the current state, will force a reset to a known state. That is, for our latch, we’d want to force it to an assigned latched value, no matter what, if anything, it had previously latched. So properly designed sequential circuits are going to look like some combination of combinatorial and sequential circuit.

Now consider products of state machines. The method I’ve been using for representing state machines as maps f:InputSequences ->Outputs is convenient for products. If f represents a state machine M, then f(w) = Output(State(M,w)). If f represents a combinational state machine of depth T then f(wuz)= f(uz) whenever length(u)>T because State(M,wu)=State(M,u) so State(M,wuz)=State(M,uz) so the outputs must be the same.  Say F(w) = (g1(w1) …, gn(wn)) is a product under connection function Phi and map w |-> w1 … ,wn if when w=empty then wi = empty for all i and if when w maps to wi then wa maps to wi.Phi(a,i,F(w)) with ‘.’ representing string concatenation. Say that F is a cascade product if and only if  Phi(a,i,(x1 … xn)) depends only on xj for j < i. That is, in a cascade product, information only flows from lower numbered factors to higher numbered ones: there is no feedback in a cascade product. If g1 …,gn are combinatorial AND Phi(a,i,(x1 …,xn)) !=empty for any combination of parameters, and F is a cascade product, then F must be combinatorial. For proof, suppose Ti is the depth of each component gi and suppose w=uv where length(v)> T1. Then w1= u1.v1 where length(v1)> T1 so g1(v1)=g1(w1) and more than that g1(w1.z)=g1(v1.z) for all z.  That means Phi(a,i,F(wa))=Phi(a,F(va)) by the cascade property. And so on – eventually we see that the depth of F is the sum of the depths of the factors! For intuition, first we need enough inputs to steer factor 1 to a known state, then factor 2 and so on.

Given a state machine M with states S, we can associate a map from S to S with every string of inputs w so that F(M,w)(s)=s’ if and only if w leads from s to s’. This set of maps forms a monoid under ordinary function compositions  and since F(M,w)(F(z)(s))= F(M,wz)(s) then F(M,w)+F(M,z)= F(M,wz) where + is the monoid operation and not anything necessarily related to arithmetic. The empty string produces the monoid identity F(M,empty)(s)=s.  Look at the monoid structure of combinatorial state machines and note that unless the monoid is trivial, then  F(M,w)=F(M,empty) only if w=empty – there are no other identities.

The proof is easy. Suppose M is combinatorial  with depth T, the monoid is not trivial so that for some v we have F(M,v) != F(M,empty)and suppose F(M,z)=F(M,empty).  Then if z is not empty let Z consist of a string of concatenated z’s that is longer than T  so Z=zzzzzz …. stretched out to a length greater than T. By the combinatorial property, for any u, F(M,uZ)(s)=F(M,Z)(s) but because F(M,z)=F(M,empty) we also know F(M,uZ)(s)=s so F(M,w)(s)=s for all w and s which implies the monoid is trivial after all. It’s also worth pointing out that the combinatorial monoids seem to be a proper subset of the aperiodic monoids ( http://www.liafa.jussieu.fr/~jep/PDF/HandBook.pdf ). Consider the state machine with states {0,1,2} and inputs {a,b} so that, treating the inputs at maps from state to state, a(0)=1 and b(0)=2 for any non-zero state we have x(s)=s. The monoid is clearly aperiodic since a2=a 3 , a2(0)=1 and a2(s!=0)=s. But the state machine is not combinatorial.

On the way.

## Failure to communicate and Process Algebras

All is made clear in the brand  (Nov 1 2009) Reducing Process Algebra paper.

From the point of view of someone who wants to understand programs and digital circuits at the RTL level, the “process algebra” line of research typified by Milner’s CCS [MILNER] begins with three rather peculiar assumptions.

(1) Programs are non-deterministicÂ  or at least should be thought of that way,

(2) Processes communicating by synchronous message exchange are good “primitive” building blocks.

(3) Classical automata theory is too limited to express the semantics of programs and can be usefully replaced with a string algebra.

Some process algebra research appears to be concerned with generic “systems”, not mere programs and digital devices but I’m going to restrict myself to the smaller domain. These assumptions mayÂ  be reasonableÂ  for business processes or astronomy, but for computer engineering, they appear to me to be unnecessary complications that don’t have any countervailing gains. Let’s begin with non-determinism.

### Nondeterminism

Here we are not talking about “non-deterministic algorithms” which are mathematical functions that “guess” at a solution. A program implementing such an algorithm executes deterministically. For example the program “x= Get_Random(); SearchFrom(x);” may get “x” as an input or may compute it as a pseudo-random variable, but starting from the same state, receiving the same input will do exactly the same computation.Â  What the “process algebra” researchers mean by “non-deterministic” is that the next state of a system is not a function (or even a probabilistic function) of the current state and the current input, but may depend on something else.

Milner asks us to suppose we have two devices D1 and D2, so thatÂ  when powered up D1 has a possible “a” action followed by a choice of a “b” or “c” action. The D2 device starts the same, but after the “a” action the state with “b” and “c” actions enabled can also non-deterministically jump to a state where only “c” is possible.

“We do not know what ‘determines’ the outcome for [D2], perhaps it is the weather”[MILNER]

One could, however,Â  leave the initial conditions of the device unspecified to get the same result without resorting to non-determinism.Â  Suppose we have, in addition to the inputs {a,b,c} a set U of unknown contents that contains environmental information. Define a class ofÂ  state machines that in the initial state input something from U, then an “a” and then either “b OR c” or just “b”.Â  Machines in class D1 cannot reach the “just b” input state, machines in class D2 can. Elements of U may be random numbers or can encode the date and a weather prediction algorithm or whatever.Â  This relocates the indeterminism to the environment and, to me, seems to make the problem statement more faithful to how digital devices work. The same procedure can be used for the “delays”Â  that Hoare argues indicate non-determinism.

The only solution is to allow the existence and duration of the delay to influence the course of the computation. For example the programmer can use the [ combinator of CSP to allow the earliest possible selection of the first possible event that can occur. Since delays are essentially unpredictable, we are almost forced to accept non-determinism as an inherent property of programs and algorithms, and even of computer hardware.[HOARE]

Those delays may be unpredictable, but how does that make the programs “non-deterministic”? A program does a “load x” operation and depending on caches, memory management, disk operation, bus width, contention, error correcting codes and so on, many things can happen and they may take some time. But given the same response by the OS and hardware, the program will always do the same thing.Â  Obviously, we have a choice between saying P is deterministic with input that e.g. can vary by system timing or saying P contains fundamental non-determinism.Â  And this is true as well forÂ  multi-threaded programs. In one sense, there’s no argument: if we run a multithreaded program twice we may get different results. That non-determinism results from the OS selection of which process to run, the contents of cache, the level of network congestion and so on.Â  But is the program non-deterministic or does its environment provide it with inputs that can vary in time and order? If we have a program P made up of multiple threads and contents of all registers and memory is given by S, then the event of the operating system choosing to run the next instruction in some thread T causes a completely determined transition to the next state that only depends on S. In fact, the kinds of things we want to prove about multi-threaded programs are often assertions about what must or must not happen, given various possibilities about which externally generated events will happen next.

Coming back to Milner’s D1 and D2, a program is not going to make a non-deterministic choice about whether to enable the “c” action, the program will read a sensor or examine a memory location or in some other way get an input that will then lead to a completely deterministic choice.Â  When we use Kirchoff’s current laws to analyze a circuit containing switches, we do not need to incorporate the “non-deterministic” operation of the switches in our circuit model. Similarly, when we evaluate the behavior of a multi-threaded web server, we do not need to incorporate the “non-deterministic” operation of networks in our web server model.

The remaining argument for including non-determinism in the mathematics we use to model programs is purely pragmatic and is based on supposed mathematical advantages and increased abstraction.Â  And this brings us to the question of automata.

### Automata

I’m not going toÂ  argue that just because something can be expressed in terms of classical automata, it should be expressed in that way,Â  but the process algebra literature is replete with weak arguments about what cannot be expressed in terms of classical automata. For the D1/D2 example, Milner writes, that while both machines accept the same regular languageÂ  “a(b+c)“, they act differently in a way that can be observed by an experimenter. Milner writes: “in standard automata theory, an automaton is interpreted as a language”, and so he argues thatÂ  the distinction he wants to draw between D1 and D2 devices cannot be expressed. ButÂ  “standard automata” theory in no way requires that machines that accept the same language be treated as identical – an automaton isÂ  not “interpreted as a language” even though some automata may be used to recognize languages. Similarly , two resistors in series can be understood as a single resistance on the line, but they are not “interpreted” as a single resistor in a way that makes it impossible to distinguish two resistors from one. The original use of state machines was to model the behavior of switches in the telephone network and non-combinatorial logic elements in circuits. There is an entire field devoted to minimizing state machines where it is important to distinguish between the original and the reduced machinesÂ  even though they are intended to have identical behavior.Â  So, if we want to, we are certainly entitled to distinguish between the two state machines.Â  It is true that if we convert a non-deterministic D2 machine to a deterministic one, weÂ  produce a machine that recognizes the same language but behaves differently: the D2 machines can enable two events and then change state to disable one of them. The theorem [RABINSCOTT] just says the converted deterministic machine and the original non-deterministic machine recognize the same language. That is, L(M1) = L(M2) does not mean it is impossible to distinguish M1 and M2 some other way.Â  For example, it’s easy enough to define Reject(M) for a state machine as the language of strings that can be rejected by M. If M is deterministic then L(M) = A* – Reject(M). But if M is non-deterministic, there may be strings in both sets.Â  Automata theory allows us to preserve this difference should we find it significant.

Another common argument is that automata cannot express interaction. Here’s a relatively complete argument.

The simplest model of behaviour is to see behaviour as an input/output function. A value or input is given at the beginning of the process, and at some moment there is a(nother) value as outcome or output. This model was used to advantage as the simplest model of the behaviour of a computer program in computer science, from the start of the subject in the middle of the twentieth century. It was instrumental in the development of (finite state) automata theory. In automata theory, a process is modeled as an automaton. An automaton has a number of states and a number of transitions, going from one state to a(nother) state. A transition denotes the execution of an (elementary) action, the basic unit of behaviour. Besides, there is an initial state (sometimes, more than one) and a number of final states.A behaviour is a run, i.e. a path from initial state to final state. Important from the beginning is when to consider two automata equal, expressed by a notion of equivalence.

On automata, the basic notion of equivalence is language equivalence: a behaviour is characterized by the set of executions from the initial state to a final state. An algebra that allows equational reasoning about automata is the algebra of regular expressions (see e.g. [58]). Later on, this model was found to be lacking in several situations. Basically, what is missing is the notion of interaction: during the execution from initial state to final state, a system may interact with another system. This is needed in order to describe parallel or distributed systems, or so-called reactive systems. When dealing with interacting systems, we say we are doing concurrency theory, so concurrency theory is the theory of interacting, parallel and/or distributed systems.[BAETTEN]

We see the same argument about language equivalence discussed above and a second argument about interaction that is actually more difficult to understand. Here is a product of automata in which each automaton can communicate with (interact with) each other automaton from a paper published in 1962.

Definition 1. Let {M1, M2.. ., Mn} be a set of Moore type machines in which the outputs of any machine Mi i = 1, 2,…, n, may be used as inputs to other machines. We shall say that this set of interconnected machines is concurrently operating if the next state (state at time t + 1) of each machine Mi depends on the present state of Mi, the present outputs (which depend only on the present states) of the machines to which it is connected, and the present external input. The ordered n-tuple (or “configuration”) of the present states of the machines M1 M2 . . ., Mn will be referred to as the state of the interconnected machine.[LOOP]

This product is certainly flexible enough to describe any type of communication. In fact, it can easily accommodate the type of synchronous message passing considered fundamental in the process algebras. Suppose we have Moore type state machines that output requests to send and receive messages on channels. Put n of them together in a product of the type Hartmanis defines and add another state machine to act as a scheduler. The input to the scheduler is the set of requests of the other machines. The output of the scheduler is a prioritization of the other factor machines. The highest priority machine then can act by either using the external input or the output of a communicating factor machine – which will also make a transition. I’ll return to this below, but first let’s look at the communicating process idea a little more.

### Message passing processes

TheÂ  assumptionÂ  that “processes communicating via synchronous message transfer” are good building blocks for programs seems false.Â  Generations of programmers have repeatedly rediscovered that threads using only synchronous message passing are excellent building blocks forÂ  creating deadlock.Â  Suppose process P does

`loop: Get x from channel 1; Send y to channel 2; send  to channel 3`

and process Q does

`loop: Get x from channel 2; Send y to channel 1; send z to channel 4`

In isolation the two processes are deadlock free. But when you connect them they lock up immediately. When processes are non-trivial, the failures are just more elusive. In order to preserve deadlock freedom under composition for such components you need a rule that components can be ordered in such a way that each component only “sends” to components of higher order or to the external network. Otherwise cycles of waiting processes can appear. Beginning an investigation ofÂ  composite communicating systems using components where

is not a rule  or even a guideline leads to serious problems [VENKMAN]. In fact, this model of programming is terrible – there is a reason why Occam never caught on and real “threads” there are timeouts available for practical message passing systems, message brokers are popular, and “select” is a standard part of most network control loops.  QNX which is an interesting operating system designed around synchronous message passing implements asynchronous signaling so that message passing systems can be designed without loops: threads should send synchronous messages only “up” and use asynchronous signals to communicate “down”.

But if synchronous message passing is primitive, timeouts or asynchronous signals are hard to model.Â  The structure of synchronization and information flow in complex computer programs is tough issue and one on which there has been a good bit of interesting research recently. From lock-free data structures to data flow processing to NUMA memory, there are all sorts of interesting ways of sharing data that are, at best, awkward to discuss if synchronous message exchange is the only means of communication.

### Semantics

“What exactly is meant by the behavior of nondeterministic or concurrent programs is far from clear”[HENNESSY]

The discussion in [HENNESSY] is fascinating in that it reveals, among other things,Â  that the topic of whether observers can see “intermediate” states is one that needs investigation 20 years into the research program. State machines, on the other hand, have very simple ideas of state that do not depend on the mentality of the observer or the weather. In fact, the question [HENNESSY] addresses was solved by Moore who provided an output map from states to outputs to distinguish clearly between the visible state (output) and the internal state. Myhill and Nerode then showed that these distinctions are robust in the sense that if you treat a Moore machine as a map from input sequences to output values, a left congruence recovers the state set up to homomorphism and a full congruence similarly recovers the underlying monoid. So it’s worth looking at how “communicating processes” could be modelled with this more restrictive and more sharply defined structure.

A deterministic Moore machine can be given as

D = (A, S, st, Î´:SxAâ†’S, Î³:Sâ†’X)

A is the alphabet set, S is a state set,Â  st in S is the start state,Â  Î´ is the transition function and Î³ is the output function.

Let’s define a class of “process” state machines that have A = { Ï„, in(c), out(c): c in C} for some set C of “channels”. The idea here is that Ï„ represents an internal state change with no i/o. The outputs are subsets of A where each output x subset A defines the set of possible events for the state machine in the next step.Â  Then let’s define a slightly more sophisticated version of Hartmanis’s product like this

M = PROCESS[ M1 ... Mn, N ]

where each Mi is a “process” state machine, N is an “operating system” machine to be defined. The state of M are the state tuples (s1 … sn, q) where each si is an element of the state set of Mi and q is an element of the state set of N.Â  The input alphabet to N is the set ofÂ  (n+1) tuples (x1, … xn, a) where each xi is in X (the output set of the process machines) and “a” is in A. The outputs of N are maps Î¦1 … Î¦k where each Î¦j:Xn Ã— A Ã— {1 … n} â†’Â  A UNION {null} and the idea is that Î¦j(x1 … ,xn,a,i) is the input to be generated for machine Mi when the outputs of the factor machines are as given and the input to the composite system is “a”. If Î¦j(x1 … ,xn,a,i)=null then Mi should not move. More formally if the state is s=(s1 … sn, q) then Î´(s,a) = (s1? … sn’, q’) whereÂ  q’ the next state of N computed from “q” and from (x1 … xn, a) by the transition map of N, and each si’ is the next state of MiÂ  either si=si’ when Î¦j(x1 … ,xn,a,i)=null and si’ is computed from si with inputÂ  Î¦j(x1 … ,xn,a,i) otherwise.Â  Our rule is that N must compute only inputs that the component machines will accept (as determined by xi) and N must produce input in(c) for component k only if either the “a” is “in(c)” and no other “process” machine advances or there is some i so “out(c)” can be generated for “i“.Â  This could all be done more simply using the methods decscribed [RECURSION]

We got rid of all of the “non-determinism” by underspecifying N which decides on which process to run next based on some algorithm we don’t say anything about.Â  Process machines might “accept” identical languages, yet differ in what inputs they enable in which states. Perhaps there is something one can express in process algebra which cannot be accomodated by this model and maybe such things would be interesting in the context ofÂ  program verification. It would be interesting to find out. But what has been shown is that the basic idea of the process algebra is relatively easy to represent with automata products – using methods known for a half century or more. And without requiring anywhere near the volume of notation and axioms that seem to be essential parts of process algebras or losing our bearings as to what is state.

# Footnotes

1. In some of the early literature functions from input strings to output strings are used. In fact, Arbib defines a state machine as a map from strings to strings “with finite support” – e.g. as described by a tuple-of-set-and-functions automaton. There  is not a big mathematical difference between maps “A*-> X*” and maps “A* -> X”,  but it makes a big difference in usability.
2. I don’t mean to say that “process algebra” and other exotic objects of “formal methods” are not mathematically, defined, but formal definition of something is not the same as mathematically coherent. We can create any kind of complex scaffolding via axioms, but there’s no assurance that what we defined has an underlying mathematical semantics beyond the purely formal.
3. Not that anything else works well, hence the utility of this work.

## References

[MILNER] A calculus of communicating systems, Robin Milner. Springer (LNCS 92), 1980. ISBN 3-540-10235-3

[MILNER2] The Polyadic pi-Calculus: a Tutorial (1991)

[RABINSCOTT] M. O. Rabin and D. Scott, “Finite Automata and their Decision Problems”, IBM Journal of Research and Development, 3:2 (1959) pp.115-125.

[HARTMANIS] Hartmanis, J.; and Stearns, R. E. (1966), Algebraic Structure Theory of Sequential Machines (Prentice Hall)

[LOOP] Juris Hartmanis: Loop-Free Structure of Sequential Machines Information and Control 5(1): 25-43 (1962)

[HALGEBRA] Hoare, C. A. 1993. Algebra and models. In Proceedings of the 1st ACM SIGSOFT Symposium on Foundations of Software Engineering (Los Angeles, California, United States, December 08 – 10, 1993). D. Notkin, Ed. SIGSOFT ‘93. ACM, New York, NY, 1-8. DOI= http://doi.acm.org/10.1145/256428.167053

[HENNESSY] Hennessy, M. and Milner, R. 1985. Algebraic laws for nondeterminism and concurrency. J. ACM 32, 1 (Jan. 1985), 137-161. DOI= http://doi.acm.org/10.1145/2455.2460

[BAETEN] Baeten, J. C. 2005. A brief history of process algebra. Theor. Comput. Sci. 335, 2-3 (May. 2005), 131-146. DOI= http://dx.doi.org/10.1016/j.tcs.2004.07.036

[RECURSION] Yodaiken, 2009, Primitive Recursive Transducers. http://yodaiken.com/papers/recursivetransducer.pdf

[VENK] Dr. Peter Venkmann, Columbia University.Â  Personal communication.

A calculus of communicating systems, Robin Milner. Springer (LNCS 92), 1980. ISBN 3-540-10235-3