Fischer Lynch Patterson and timeouts

There is a widely cited (over 1400 cites in CiteseerX ) result called the Fischer-Lynch-Patterson theorem about consensus – a key issue in distributed databases or any system where data is either distributed or replicated or both.

In this paper, we show the surprising result that no completely asynchronous
consensus protocol can tolerate even a single unannounced process death.

As far as I can tell the “surprising” result is that unbounded delays are not bounded. Without timeouts, we can not distinguish between a site B that will eventually send a message to site A and a site B that has failed (crashed) and will never send a message.

We do not consider Byzantine failures, and we assume that the message system is reliable it delivers all messages correctly and exactly once. Nevertheless, even with these assumptions, the stopping of a single process at an inopportune time can cause any distributed commit protocol to fail to reach agreement. Thus, this important problem has no robust solution without further assumptions about the computing environment or still greater restrictions on the kind of failures to be tolerated!
Crucial to our proof is that processing is completely asynchronous; that is, we make no assumptions about the relative speeds of processes or about the delay time in delivering a message. We also assume that processes do not have access to synchronized clocks, so algorithms based on time-outs, for example, cannot be used. (In particular, the solutions in [6] are not applicable.) Finally, we do not postulate the ability to detect the death of a process, so it is impossible for one process to tell whether another has died (stopped entirely) or is just running very slowly.

The Wikipedia summary is similar:

In a fully asynchronous system there is no consensus solution that can tolerate one or more crash failures even when only requiring the non triviality property.^[11] This result is sometimes called the FLP impossibility proof. The authors Michael J. Fischer, Nancy Lynch, and Mike Paterson were awarded a Dijkstra Prize for this significant work. The FLP result does not state that consensus can never be reached: merely that under the model’s assumptions, no algorithm can always reach consensus in bounded time. In practice it is highly unlikely to occur.

I don’t get what is “significant” about this. It’s just a tautology wrapped in a complex formalism. If there is no upper bound on how long it can take for a node to send a message, and the only information passed between nodes is via message, then there is no way for one node to tell the difference between a delayed response and a response that will never come because the corresponding node has failed.

Fischer Lynch Patterson and timeouts

Tagged on: fischer-lynch-patterson