Replication

Database replication is the practice of keeping multiple copies of the same data on different nodes for availability, durability, and read scaling. Almost every production database runs with at least one replica; the design choices are around topology, synchronisation, and failover.

Common topologies

  • Primary-replica (master-slave). One node accepts writes, replicas follow. Standard in PostgreSQL, MySQL, MongoDB, Redis.
  • Multi-leader. Multiple nodes accept writes; replicate to each other; need conflict resolution. Used for multi-region active-active.
  • Leaderless. Any node accepts writes; replicas reconcile via quorum reads and writes. Used in DynamoDB, Cassandra, Riak.

Synchronisation modes

  • Synchronous. Writes wait for replicas to acknowledge before returning success. Stronger durability, higher latency.
  • Asynchronous. Writes return after the primary commits; replicas catch up later. Lower latency, risk of data loss on primary failure.
  • Semi-synchronous. Wait for at least one replica before acknowledging; balances durability and latency.

Replication lag and its consequences

Asynchronous replicas always lag behind the primary by some amount. This produces familiar bugs: a user writes a comment, the read after the write hits a replica, and the comment is not there yet. Mitigations include sticky reads (route a user's reads to a replica that has seen their writes), bounded staleness guarantees, or reading from the primary when freshness matters.

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon