Advanced 15 terms

Distributed Systems Consensus

Raft, quorum, linearizability, split-brain, leader fencing, CAP and PACELC theorems, vector clocks, and distributed transaction vocabulary.

All vocabulary sets

Quorum /ˈkwɔːrəm/

The minimum number of nodes that must agree for a distributed operation to succeed. For a cluster of N nodes, a simple majority quorum is ⌊N/2⌋ + 1, ensuring that any two quorums share at least one node and preventing split decisions.

"In our 5-node Kafka cluster, the minimum in-sync replicas is 3 — that's the quorum. A message is only acknowledged to the producer once 3 replicas have written it."
Leader Election /ˈliːdər ɪˈlekʃən/

The process by which nodes in a distributed system agree on a single coordinator (the leader) responsible for sequencing and replicating writes. Triggered when the current leader fails or becomes unreachable.

"When the primary went down, Raft triggered a leader election. Within 150ms, node-3 had won the election with votes from 3 of 5 nodes and started accepting writes."
Election Term /ɪˈlekʃən tɜːrm/

A monotonically increasing counter used in Raft to identify which election cycle is in progress. Each new election increments the term. Nodes reject messages from leaders with a lower term number, preventing stale leaders from accepting writes.

"The old leader was partitioned and was still in term 11. When it reconnected, nodes in term 14 rejected its write requests — it stepped down and followed the current leader."
Log Replication /lɒɡ ˌreplɪˈkeɪʃən/

In consensus algorithms like Raft, the mechanism by which a leader appends entries to follower logs, then commits an entry once a quorum of followers have acknowledged it.

"Log replication ensures all followers have an identical, ordered log of state transitions. An entry is only committed — marked as safe to apply to the state machine — once the quorum has it."
Linearizability /ˌlɪniərˌaɪzəˈbɪlɪti/

The strongest consistency guarantee for individual operations: once a write completes, all subsequent reads from any node return that value. Operations appear instantaneous and respect real-time ordering.

"Our coordination service provides linearizable reads — once a config change is written, every node that reads after that moment gets the new value, regardless of which replica they hit."
Serializability /səˌriːələˈbɪlɪti/

A consistency model for transactions: the result of executing a set of concurrent transactions is equivalent to some serial (one-at-a-time) execution. The gold standard for database transaction isolation.

"Our payment service uses serializable isolation — concurrent transactions on the same account produce results identical to running those transactions one at a time."
Eventual Consistency /ɪˈventʃuəl kənˈsɪstənsi/

A weak consistency model where replicas are guaranteed to converge to the same state only if no new updates are made. Replicas may serve stale reads during the propagation window.

"The user profile service uses eventual consistency — a profile update may take up to 500ms to propagate to all regional replicas. We accept stale reads for this use case to get the availability benefit."
CAP Theorem /kæp ˈθɪərəm/

States that a distributed system under a network partition can provide either Consistency (all nodes see the same data) or Availability (every request receives a response), but not both simultaneously.

"Under CAP, we chose AP — during a partition, the service continues serving (possibly stale) reads rather than refusing requests. Our use case tolerates slight staleness."
PACELC Theorem /pæsɛlk ˈθɪərəm/

Extends CAP: (P)artition → choose (A)vailability or (C)onsistency; (E)lse (no partition) → choose (L)atency or (C)onsistency. Acknowledges that consistency–latency trade-offs are always present, not just during partitions.

"PACELC exposes the trade-off we make even under normal operations: strong consistency requires synchronous replication across all replicas, which adds latency. We chose EL — lower latency with eventual consistency."
Split-Brain /splɪt breɪn/

A scenario where a network partition causes two or more nodes to independently believe they are the leader and accept conflicting writes. The most dangerous failure mode in distributed consensus systems.

"The split-brain scenario: network partition — old leader and new leader both accept writes for 30 seconds. When the partition healed, the data diverged and required manual reconciliation."
Leader Fencing /ˈliːdər ˈfensɪŋ/

A mechanism that prevents a partitioned or demoted leader from completing writes after a new leader is elected. Common approaches: fencing tokens (monotonic counter) or STONITH (Shoot The Other Node In The Head).

"Leader fencing via STONITH: when node-1 was partitioned and a new leader elected, the cluster immediately powered off node-1 via the IPMI interface to prevent it from writing to shared storage."
Two-Phase Commit (2PC) /tuː feɪz kəˈmɪt/

A distributed transaction protocol with a Prepare phase (coordinator asks all participants if they can commit) and a Commit phase (all must commit or all abort). Provides atomicity at the cost of blocking on coordinator failure.

"The cross-database transfer uses 2PC: both the payment DB and the ledger DB prepare and lock resources, then both commit only after the coordinator receives 'ready' from both. If either says 'abort', both roll back."
Saga Pattern /ˈsɑːɡə ˈpætərn/

A sequence of local transactions in a distributed system, each publishing an event or message. If a step fails, compensating transactions undo the previous steps. Avoids the blocking of 2PC.

"The booking saga: Reserve seat → Charge card → Issue ticket. If card charge fails, a compensating transaction releases the reserved seat. No distributed locks required."
Vector Clock /ˈvektər klɒk/

A data structure (array of logical counters, one per node) that tracks causality in a distributed system. Given two events, vector clock comparison determines if one caused the other or if they are concurrent.

"Vector clocks show that event A on node-1 happened before event B on node-2 because A's counter was already observed in B's vector. Concurrent events cannot be ordered and require conflict resolution."
Hybrid Logical Clock (HLC) /ˈhaɪbrɪd ˈlɒdʒɪkəl klɒk/

A timestamp combining physical wall-clock time and a logical counter. Maintains causality like a logical clock while remaining close to real time, making events sortable across nodes without tight clock synchronisation.

"CockroachDB uses hybrid logical clocks — each node's HLC stays within a bounded skew of wall-clock time. This lets us sort distributed transactions chronologically without requiring atomic clocks."

Distributed Systems Consensus

Quick Quiz — Distributed Systems Consensus

Quiz complete!