PAXOS – (c) Victor Yodaiken, 2022
These are the lecture notes, or see the paper.

1. Really clever distributed consensus algorithm by Leslie Lamport
	A. Infamously hard to understand - but not complicated
	B. Livelocks - can get stuck without reaching consensus
	C. Highly patented
	D. Widely used - supposedly.

2. Original paper: The Part Time Parliament, 1990 
	Most annoying CS paper ever published.

3. Paxos Made Simple, 2001. 

	The key text: but it really describes 3 different
	algorithms.



====================================================

DISTRIBUTED CONSENSUS PROBLEM

Network of computers (agents) that can lose messages but never corrupts them
		(semi-realistic)

Agents can fail completely but don't lie or break the rules

Proposer agents propose  values

Acceptor agents can accept proposals or reject them

Goals:	Majority of acceptors agree on value,
	Winning proposer sees agreement
	No more than one winning value (safety)




====================================================
SIMPLE SOLUTION - NOT PAXOS

1. One network "agent" is the only proposer
2. The proposer sends a proposed value to all acceptor agents until ..
	 Acceptors send back acknowledgment messages
3. When the proposer has acks from a majority it knows it has won

Use timeouts to make it more robust - 
	Oki and Liskov, Viewstamped Replication, 1988

Use cycles for more efficiency
	Chang and Maxemchuk, Reliable Broacast Protocols, 1984

Use with "state machine" model for fault tolerance
	Borg, Baumbach, Glazer,
	A Message System Supporting Fault Tolerance 1983
 

====================================================
PAXOS SOLVES THE PROBLEM OF TOO MUCH SIMPLICITY
	  
Key premise at start is the network is asynchronous (no clocks)
	(fashionable premise 1980s, not realistic)
	   
also that the algorithm needs multiple proposers
	(for fault tolerance, performance?)

Clever idea: Multiple winning proposers ok
	- if they all propose the same value.

	2 phase algorithm ensures that proposers converge on single
	winning value	






====================================================
PAXOS MADE SIMPLE

Pages 1-6 explain how the algorithm works

Page 6, Lamport explains that the algorithm "livelocks" although it is safe

Page 7: different algorithm also called Paxos that has a single proposer
selected via timeouts (clocks) or randomization.

Rest: the state machine model of fault tolerance using Paxos2

Question: What is advantage of Paxos2 over Simple Method?




====================================================
HOW DOES PAXOS1 SORT OF WORK?

id numbers are unique per proposer and per proposal.
(called "sequence numbers" in Paxos Made Simple)

Proposers start in phase1 and ask acceptors to agree to their proposal id.

When/if a majority of acceptors agree to the proposal id
	The proposer can pick a value, enter phase 2,
	and send its actual proposal

When/if a majority of acceptors accept the proposal, the proposer wins

Key: the messages agreeing to a proposal id carry the best
(highest numbered) proposal the acceptor has accepted so far
the proposer has to take (inherit) the value from the highest
numbered one of those 

====================================================
THERE ARE FOUR KINDS OF MESSAGES

	(phase1, id)   - sent by proposers in phase 1 to acceptors

	(ok1, id, bestproposal ) -sent by acceptors to agree to phase 1

	(phase2, id,value) - sent by proposers in phase 2

	(ok2, id) - sent by acceptors to agree to phase 2 proposal


Also messages reliably carry the identity of the original sender
of the message






====================================================
ACCEPTORS ENFORCE ID ORDER

Each acceptor A keeps track of 
	A.BestID which is the highest id it has agreed to (either phase)
		A.BestID is initially 0 
	A.BestProposal =(phase2,id,value) the highest numbered proposal
		the acceptor has accepted
		A.BestProposal is initially null

When an acceptor A receives m=(phase1,id) if m.id > A.BestID
	can send (ok1,m.id,A.BestProposal) back
		update A.BestId = m.id

When an acceptor A receives m=(phase2,id,value) if m.id >= A.BestId
	can send (ok2,m.id) back and update A.BestProposal=m and A.BestId=m.id


====================================================
PROPOSERS CAN ABANDON THEIR NON-WINNING PROPOSALS

They can pick a new id number that must be higher and still unique
and start over

at any time 

but it can still livelock forever

Proposer 1 and 2

2 starts and gets a majority in phase 1 and then rests

1 starts and gets 1 less than a majority, blocks, increases
number to 3, gets a majority for phase 1

... repeat








=====================================================
PROOF OF SAFETY:

If only one proposal has won, nothing to prove
If more than one proposal has won, let 
	m_0 =(id0,value0) be the winning proposal with the lowest id number

Let L = (m_0,m_1,m_2 ... m_N) add to m_0, in id order,
	the complete list of proposals
	with m_i.id > m_0.id that have been sent by some proposer,

If m_i is on the list, 
	the proposer of m_i reached phase2 with id m_i.id

	L includes all winning proposals plus more

Claim: All of these must have the same value v = m_0.value. 




====================================================
WHY?

P0 sent m_0 and received (ok2,m_0.id) from a majority of acceptors
because we chose it as the lowest numbered WINNING proposal

if P sent proposal m_i, i>0, then P got
	(ok1,m_i.id, best) from a majority of acceptors 

The two majority sets of acceptors must intersect at some A
A must have sent (ok2,m_0.id) BEFORE it sent
(ok1,m_i.id,A.BestProposal) since m_i.id > m_0.id

When A agrees to m_i.id it already accepted m_0, 
	so A.BestProposal was not null
	and m_0.id <= A.BestProposal.id < m_i.id 

P must have inherited at least 1 proposal m with
	m_0.id <= m.id < m_i.id


====================================================
BACK TO LIST FOR INDUCTION

L = (m_0,m_1,m_2 ... m_N)

CLAIM i >= 0 implies m_i.value = m_0.value.
Proof by induction.
True for i=0
Suppose true for 0,... j-1 and consider j

the proposer P of m_j inherited some best message m with
	m_0.id <= m.id < m_j.id
and since m was sent, the proposer of m must have reached phase2
	so m must be on the list.
Since m.id < m_j.id and it is on the list,
	m.value = m_0.value and so m_j.value = m_0.value

QED.




Lecture notes on Paxos