Robin Milner’s influential book Communication and Concurrency involves a take on state machines that has always puzzled me. “*Now in standard automata theory, an automaton is interpreted as a language,*” Milner writes “*i.e. as a set of strings over the alphabet.*” That’s not at all correct, but let’s accept the claim for now and follow the argument. Consider two state machines **A** and **B** with an alphabet of events {a,b,c,d} and **A** has states {A,A1,A2, A3} and **B** has states {B,B’,B1,B2,B3}. The state machine transitions can be given by ordered triplets (state1,input, state2) that show the input label on a directed edge between state1 and state2. For Milner’s example:

state machine **A** has transitions { (A,a,A1), (A1,b,A2), (A1,c,A3), (A3,d,A) },

state machine** B** has transitions: { (B,a,B1) (B,a,B’), (B1,b, B2), (B’,c,B3), (B3,d,B)}.

**B** is non-deterministic because there are 2 “a” transitions from state B. Milner points out that if we consider A2 and B2 to be accept states, both machines accept the same language (acd)*ab. So far so good. At this point Milner asks us to think of {a,b,c,d} as “ports” or maybe buttons that can be pushed. The button “*is unlocked if the agent can perform the associated action and can be depressed to make it do the action, otherwise the button is locked and cannot be depressed*.” Then: “*after the a-button is depressed a difference emerges between A and B. For A – which is deterministic – b and c will be unlocked, while for B – which is non-deterministic – sometimes only b will be unlocked and sometimes only c will be unlocked.*” If you don’t look carefully, you’ll miss a subtle change of conditions that has significant ramifications.

An experimenter or external agent trying to push these buttons will discover a difference between the two machines eventually because some times after an initial “a” input on the second state machine a “b” is possible and sometimes not, although on the first state machine after an “a” the “b” is always possible. But how does the external agent determine that **B** will not perform a “b” action sometimes? The external agent “attempt[s] to depress b” and fails – the locked/unlocked state of each button is visible to the external agent. So Milner has changed definitions in the middle of the argument. At the start, finite state machines were language recognizers with, as the classical text on automata theory explains: “*output limited to a binary signal: ‘accept/don’t accept’ ” [Hopcroft and Ullman]. *Those automata will not tell us anything else about a word other than that binary condition – is it in the language or not. But Milner’s button state machines tell us also what buttons are locked and what are unlocked in the terminal state reached by the word. So Milner’s state machines distinguish words that a recognizer state machines does not. In fact, these Milner state machines have 5 binary outputs in each state – indicating the locked/unlocked status of each button plus accept/don’t accept. State machines with more than a binary output alphabet are called Moore or Mealy machines in poor old standard automata theory.

Standard automata theory does not “interpret” state machines “as a language” but there is a theorem that the class of languages recognized by those finite state binary output deterministic state machines is the same as the class of languages recognized by finite state non-deterministic state machines. Two machines that recognize the same language may be distinct in many other ways. And state machines that have additional outputs (sometimes called “transducers”) are essentially descriptions of maps from input strings to output strings or from input strings to output value in the terminal state. Standard automata theory would say Milner’s two machines accept the same language of strings, but produce different languages of strings.

Standard automata theory, as far as I know, has never really considered the case of non-deterministic Moore machines but the extension is trivial. Milner’s transition systems are just labeled directed graphs with a root vertex. Consider a labeled directed graph G with labels A, a distinguished root vertex (start state) s0, the set of triples* R= { (s1,a,s2) if there is an edge labeled a from s1 to s2}*. The set of vertices V is the set of states. We can define a relation R* subset A* x V so that R* is the largest set containing only (null,s0) and (wa,s’) whenever (w,s) is in R* and (s,a,s’) is in R – where wa is the string obtained by appending “a” to “w” on the right. For any vertex “s” define Q(s) to be the subset of A containing every “a” so that (s,a,s’) is in R for some s’ in V. Then let G* be the set of pairs (w,Q(s)) for every (w,s) in R*. As far as I can tell on a quick look, Milner’s bisimulation between G1 and G2 is simply equality of G1* and G2*.