CRDTs: Conflict-Free Replicated Data Types
Architecture
The Problem CRDTs Solve
Two datacenters, each with a full replica of your data. A user on the US East Coast adds item X to their cart via datacenter A. Simultaneously, a user on the European side removes item Y from the same cart via datacenter B. Both writes succeed locally. A few hundred milliseconds later, the replicas try to sync.
If you are using a single-leader database, one of these writes went through the leader and the other must be rejected or queued. That is fine for consistency, but the user who got rejected sees a failure, and you have now coupled the availability of your European datacenter to the health of a transatlantic network link.
CRDTs take a fundamentally different approach. Both writes succeed immediately, on separate replicas, without any coordination. Later, when the replicas exchange state, a deterministic merge function combines them. The mathematical properties of this merge function guarantee that all replicas converge to the same state regardless of the order messages arrive, how many times they are delivered, or how long the replicas were partitioned.
So what do you give up? The ability to reject conflicting writes at write time. No coordination means no coordination. What you get back is availability during partitions and local write latency everywhere.
The Mathematical Foundation
A CRDT merge function must satisfy three properties.
Commutativity. merge(A, B) = merge(B, A). It does not matter which replica's state arrives first.
Associativity. merge(A, merge(B, C)) = merge(merge(A, B), C). You can merge multiple replicas in any grouping.
Idempotency. merge(A, A) = A. If the same state is delivered twice due to network retries, nothing changes.
These three properties together form a join-semilattice. The states of the CRDT, combined with the merge function, form a partially ordered set where the merge always produces the least upper bound of its inputs. What that means in practice: the state can only move "up" in the lattice. It never goes backward, and all replicas climb toward the same final state.
This is not just abstract algebra. If your merge function violates any of these properties, replicas will diverge and you will have corrupt data in production. Every CRDT you build or adopt should be verified against these three properties before it touches production traffic.
State-Based vs Operation-Based
State-based CRDTs (also called CvRDTs, convergent replicated data types) work by shipping the entire state from one replica to another. The receiving replica merges it with its local state using the join function. This is simple: you can use any unreliable transport, messages can be reordered, duplicated, or delayed, and convergence still holds. The downside is bandwidth. Shipping a full counter or set every sync cycle gets expensive as the data structure grows.
Operation-based CRDTs (also called CmRDTs, commutative replicated data types) ship individual operations instead of full state. "Increment by 1" or "add element X" rather than "here is the entire counter." This is much more bandwidth-efficient, but it comes with a requirement: the delivery layer must guarantee that each operation is delivered exactly once. If operations are duplicated, you need them to be idempotent. If operations can arrive out of order, you need commutativity.
In practice, many systems use a hybrid. They ship operations during normal operation for efficiency, and periodically do full state synchronization to recover from any lost messages or to bring new replicas up to speed.
G-Counter: The Simplest CRDT
A grow-only counter is the "hello world" of CRDTs. Each of N nodes maintains its own local count. Node A has count_A, Node B has count_B, and so on. The total value of the counter is the sum of all node counts.
When node A wants to increment, it increments count_A. That is it. No coordination.
Merging two replicas: take the element-wise maximum of their count vectors. If replica 1 has [A:5, B:3, C:7] and replica 2 has [A:4, B:6, C:7], the merged state is [A:5, B:6, C:7].
Why max and not sum? Because messages might be duplicated or replayed. If you sum, a duplicated merge message would double-count. Max is idempotent: applying the same state twice does not change the result.
The total counter value after merge is 5+6+7=18. Both replicas arrive at 18 regardless of message ordering. The commutativity, associativity, and idempotency properties all hold trivially for element-wise max.
PN-Counter: Supporting Decrements
A grow-only counter is limiting. What if you need to decrement? You cannot simply subtract from a G-Counter because subtraction breaks the lattice property (the state would go "down").
The solution is two G-Counters: one for increments (P) and one for decrements (N). The value is P - N. To increment, increment the P counter at your node. To decrement, increment the N counter at your node. Each G-Counter merges independently using element-wise max.
This works because both P and N are G-Counters that only grow, preserving the lattice property. The derived value (P - N) can go up or down, but the underlying state always moves forward.
The metadata cost is 2N integers, where N is the number of nodes. For a 10-node cluster, that is 20 integers per counter. Manageable. For a 10,000-node cluster, that is 20,000 integers per counter, which starts to hurt if you have millions of counters.
OR-Set: The Workhorse
The OR-Set (observed-remove set) is the CRDT that gets the most practical use. It supports both add and remove operations on a set, with a clear policy for concurrent add and remove of the same element: add wins.
The mechanism: every add operation generates a unique tag (a UUID, or a node-ID plus a logical timestamp). When you add element X, you are really adding the pair (X, unique_tag_42). When you remove element X, you remove all tags currently associated with X at your replica.
Now consider a concurrent scenario. Replica A adds X with tag_1. Concurrently, replica B removes X (which removes whatever tags B knows about). When they merge, replica A's tag_1 was never seen by replica B, so B's remove did not remove it. The merged state contains (X, tag_1). Element X is present. Add wins.
This is the "observed-remove" part of the name. A remove only removes tags that the remover has observed. Concurrently added tags survive.
The cost is that each element carries a set of tags, and removed elements leave behind tombstones (the record that tag_1 was removed, needed to prevent a stale replica from re-adding it). Over time, these tombstones accumulate. Production systems need a garbage collection mechanism, which typically requires a brief coordination round to confirm all replicas have seen the tombstone. This coordination partially undermines the "no coordination" promise, but it is infrequent and bounded.
LWW-Register: Simple but Dangerous
The last-writer-wins register is the simplest CRDT for storing arbitrary values. Each write carries a timestamp. When merging, the value with the highest timestamp wins.
This is what Cassandra uses for individual columns. It is what DynamoDB defaults to with its built-in conflict resolution. It is simple, bandwidth-efficient, and easy to reason about.
The danger is subtle. If two users concurrently update the same register, one update is silently discarded. For a "last login time" field, that is fine. For a "shopping cart contents" field, a user just lost their changes with no indication that it happened.
LWW also requires roughly synchronized clocks. If node A's clock is 5 seconds ahead of node B's clock, node A's writes will always win over concurrent writes from node B, regardless of which happened later in real time. The combination of clock skew and silent data loss makes LWW a poor default choice for anything where concurrent updates carry meaning.
MV-Register: Preserving Conflicts
The multi-value register takes the opposite approach from LWW. When concurrent writes happen, keep all of them. Instead of silently picking a winner, the register presents multiple values (siblings) to the application and lets the application decide how to merge them.
Riak uses this approach. When you read a key that has conflicting concurrent writes, you get back all the conflicting values along with their version vectors. Your application code must resolve the conflict (perhaps by merging shopping cart contents, or by presenting the user with a choice).
The advantage: no data is lost silently. The disadvantage: application code must handle siblings, and many developers forget to, leading to unbounded sibling growth. Riak had to add a "sibling cap" to prevent this.
Delta-State CRDTs
Standard state-based CRDTs ship the entire state on every sync. For a G-Counter with 100 nodes, that is 100 integers. Manageable. For an OR-Set with 50,000 elements, each carrying version metadata, that is potentially megabytes per sync.
Delta-state CRDTs optimize this by tracking what changed since the last sync with each peer. Instead of sending the full state, send only the delta (the mutations since the last sync). The delta itself is a valid state fragment that can be merged with the same join function.
The subtlety is that you need to track which deltas have been acknowledged by each peer. If a delta is lost, you resend it. If a peer falls too far behind, you fall back to sending the full state. This adds bookkeeping overhead but reduces bandwidth dramatically in steady state.
Automerge uses a delta-state approach for collaborative document editing, sending only the individual character insertions and deletions rather than the entire document.
The Garbage Collection Problem
This is where the elegance of CRDTs meets the mess of production systems.
OR-Sets accumulate tombstones for removed elements. PN-Counters accumulate entries for every node that ever incremented or decremented, even nodes that left the cluster years ago. G-Sets, by definition, never shrink.
Without garbage collection, metadata grows monotonically. In a system running for months or years, the metadata can dwarf the actual data. Production teams have reported OR-Sets where 90% of the storage was tombstones.
Garbage collection requires knowing that all replicas have observed the tombstone. Only then is it safe to delete the tombstone, because no stale replica can accidentally resurrect the removed element. But determining "all replicas have seen this" requires coordination, which is exactly what CRDTs were designed to avoid.
Practical systems handle this with periodic "GC rounds" where replicas exchange watermarks (the latest version each replica has fully merged). Tombstones older than the minimum watermark across all replicas can be safely collected. This coordination is lightweight and infrequent (minutes or hours between rounds), but it means CRDTs in practice are not truly coordination-free. They are coordination-minimal.
CRDTs vs Operational Transformation
CRDTs are often mentioned alongside Operational Transformation (OT), and for good reason. Both solve the same core problem: how do you let multiple users edit shared data concurrently and guarantee everyone ends up with the same result?
The approaches could not be more different.
OT works by transforming operations. When two users make concurrent edits, a transform function adjusts one operation's positions to account for the other. This requires a central server to define the canonical operation order. Google Docs has run OT in production since 2006, and it works well when you have a server, low-latency connections, and text-like data.
CRDTs work by making the data structure itself convergent. No operation transformation, no central server, no ordering requirement. Replicas sync whenever they can, in any order, and the math guarantees convergence. This is why CRDTs shine in peer-to-peer systems, offline-first apps, and multi-datacenter replication where no single server should be a bottleneck or point of failure.
The correctness story is also different. OT's transform functions need to satisfy TP1 and TP2 properties, and multiple published algorithms were later found to have bugs. Google avoids TP2 entirely by routing everything through a server (the Jupiter protocol). CRDTs get correctness from the algebraic properties of the merge function. If your merge is commutative, associative, and idempotent, convergence is guaranteed by construction.
Where OT wins: bandwidth. An OT message is a small operation ("insert 'a' at position 5"). A state-based CRDT ships the entire state. Even operation-based CRDTs carry more metadata per operation than OT does. For a collaborative text editor with a reliable server, OT sends less data over the wire.
Where CRDTs win: topology flexibility and resilience. No server dependency, no single point of failure, works across datacenters with arbitrary latency. Figma runs CRDTs for multiplayer design. Redis Enterprise uses CRDTs for active-active geo-replication. These are use cases where OT's server dependency would be a liability.
The line between them is blurring. Yjs is technically a CRDT but borrows ideas from OT. Automerge started as pure CRDT but now includes OT-inspired optimizations. If you are choosing between them today, the decision usually comes down to one question: do you have a reliable central server? If yes, both work. If no, CRDTs are your only real option.
Choosing the Right CRDT
There is no universal CRDT. The choice depends on what semantics you need.
For counters (page views, likes, inventory levels), use G-Counter or PN-Counter. The metadata scales with the number of replicas, which is usually small.
For sets with add-only semantics (tags, bookmarks, items in a collection that are never removed), use G-Set. Zero metadata overhead beyond the elements themselves.
For sets with add and remove (shopping carts, user preferences), use OR-Set. Be prepared for tombstone management.
For single values with clear last-writer-wins semantics (status fields, configuration flags), LWW-Register is fine. But be honest about whether concurrent updates can really be silently dropped.
For single values where concurrent updates matter (collaborative editing, shared documents), use MV-Register and handle siblings in application code, or use a specialized CRDT like the RGA (Replicated Growable Array) for text editing.
For maps and nested structures, most frameworks provide CRDT maps where each key holds a CRDT value. Automerge and Yjs both support nested CRDT documents. The complexity grows quickly, and debugging convergence issues in deeply nested structures can be challenging.
The honest assessment: CRDTs are powerful for a specific class of problems where you need multi-master writes with automatic conflict resolution and can express your data model in terms of the available CRDT types. They are not a general-purpose replacement for consensus. When you need to enforce an invariant across replicas (like "total inventory must not go below zero"), CRDTs cannot help because enforcing cross-replica invariants requires coordination.
Key Points
- •CRDTs let multiple replicas accept writes concurrently without any coordination and merge them deterministically. The merge function must be commutative, associative, and idempotent, which guarantees convergence regardless of message ordering or duplication
- •State-based CRDTs ship the full state and merge with a join operation. Operation-based CRDTs ship individual operations that must be commutative. State-based is simpler to implement, op-based is more bandwidth-efficient at scale
- •The OR-Set (observed-remove set) is the most practically useful CRDT for application developers. It handles concurrent add and remove by tagging each add with a unique identifier, so concurrent add-wins semantics are unambiguous
- •Redis Enterprise, Riak, Automerge, and Figma all use CRDTs in production for active-active geo-replication or real-time collaborative editing
Used By
Common Mistakes
- ✗Choosing LWW (last-writer-wins) when concurrent updates carry semantic meaning. If two users update a shopping cart simultaneously, LWW silently drops one update. An OR-Set or counter would preserve both
- ✗Ignoring the metadata overhead of OR-Sets. Each element carries a set of unique tags (one per concurrent add), and removed elements leave tombstones. Without garbage collection, metadata grows without bound
- ✗Assuming CRDTs solve all conflict problems. CRDTs solve a specific mathematical class of conflicts where a commutative merge exists. Many business logic conflicts (like double-booking a hotel room) have no meaningful automatic resolution
- ✗Not implementing garbage collection for tombstones. Deleted elements in OR-Sets leave metadata behind. Over weeks or months of operation, tombstone accumulation can consume more memory than the live data