Consensus
Consensus is the problem of getting a set of distributed processes to agree on a single value despite failures, message delays, and out-of-order delivery. Consensus protocols are the foundation under any replicated state machine, leader election, distributed lock, or strongly consistent database.
The classical guarantees
- Agreement. All correct processes decide the same value.
- Validity. The decided value was proposed by some process.
- Termination. All correct processes eventually decide.
The FLP impossibility result proves that no deterministic protocol can guarantee all three under an asynchronous network with one faulty process. Practical protocols sidestep this with timeouts and randomization.
Common protocols
- Paxos. The original; correct, subtle, hard to implement.
- Multi-Paxos. Optimised Paxos for repeated decisions.
- Raft. Designed for understandability; the dominant choice in new systems (etcd, Consul, CockroachDB, TiKV).
- Viewstamped Replication. Predates and is similar to Raft in spirit.
- PBFT and Byzantine variants. Tolerate malicious as well as crash failures; used in blockchains and high-trust settings.
Where it shows up
- Coordination services (ZooKeeper, etcd, Consul)
- Strongly consistent databases (Spanner, CockroachDB, FoundationDB, TiDB)
- Distributed locks and leader election
🔗