Two-Phase Commit (2PC)

The Atomic Commit Problem

You are transferring $100 from account A (on server 1) to account B (on server 2). You debit A and credit B. Both operations must succeed, or neither should. If A is debited but B is not credited, $100 vanishes. If B is credited but A is not debited, $100 appears from nowhere.

On a single database, this is easy. Wrap it in a transaction, and the database guarantees atomicity. But when the data lives on different servers, different databases, or different shards, there is no single transaction manager. You need a protocol that coordinates the commit across multiple independent participants. That protocol is two-phase commit.

How 2PC Works

The protocol involves a coordinator (the node driving the transaction) and multiple participants (the nodes holding the data).

Phase 1: Prepare

The coordinator sends a PREPARE message to every participant. Each participant does everything needed to commit the transaction except actually committing: it acquires locks, validates constraints, writes a prepare record to its write-ahead log. If everything checks out, it responds YES. If anything goes wrong (constraint violation, disk full, timeout), it responds NO.

A YES vote is a serious promise. The participant is saying: "I can commit this transaction, I have all the locks and resources, and I will not unilaterally decide to abort. I will wait for your decision." This promise must survive crashes, which is why the prepare record goes to the WAL before responding.

A NO vote is unilateral. The participant has already decided to abort its local portion and does not need to hear back from the coordinator.

Phase 2: Commit or Abort

The coordinator collects all votes. If every participant voted YES, the coordinator writes a commit record to its own WAL and sends COMMIT to all participants. If any participant voted NO (or did not respond within the timeout), the coordinator writes an abort record and sends ABORT to all participants.

Each participant receives the decision, applies or rolls back, releases its locks, and acknowledges. Once the coordinator receives all acknowledgments, the transaction is fully complete.

The Decision Point

There is a critical moment in 2PC: the instant when the coordinator writes its commit decision to durable storage. Before that point, the transaction can still be aborted (if any participant has not voted yes or the coordinator decides to abort). After that point, the decision is final. Participants who have voted yes will eventually be told to commit.

This decision point is the source of both the power and the weakness of 2PC.

The Blocking Problem

Here is the scenario that makes 2PC controversial.

Participant A votes YES. Participant B votes YES. The coordinator writes "commit" to its log and then crashes before sending the commit message to anyone.

Now both participants are stuck. They voted YES and promised not to abort unilaterally. They are holding locks on the data. They cannot commit because they have not been told to. They cannot abort because they promised not to. They are blocked, waiting for a coordinator that might not come back for minutes, hours, or ever (until someone manually intervenes).

Meanwhile, other transactions that need those locked rows are also blocked. The blast radius can be enormous.

This is the fundamental problem with 2PC. In a system with unreliable nodes, any participant can get stuck in an uncertain state between voting YES and receiving the final decision. During that uncertainty window, it holds locks and blocks other work.

Dealing with Coordinator Failure

Several strategies address the blocking problem:

Timeout and abort. The simplest approach. If a participant votes YES and does not hear back within T seconds, abort. This violates the "promise to commit" guarantee but is sometimes acceptable in practice. The risk: the coordinator eventually recovers and sends COMMIT, but the participant has already aborted. Now you have an inconsistency. Recovery protocols exist to detect and repair this, but they add complexity.

Cooperative recovery. When the coordinator is down, participants ask each other: "Did you get a commit or abort message?" If any participant received COMMIT, everyone commits. If any participant received ABORT, or if any participant has not yet voted (meaning it voted NO or never got the PREPARE), everyone aborts. The only irrecoverable case is when all participants voted YES and none received the final decision.

Replicated coordinator. The approach used by Google Spanner and CockroachDB. The coordinator is not a single node but a Paxos or Raft group. If the coordinator leader fails, another node in the group takes over, reads the commit decision from the replicated log, and finishes the protocol. This eliminates the blocking problem at the cost of running a consensus protocol for the coordinator itself.

Three-Phase Commit (3PC): The Theoretical Fix

Skeen proposed 3PC in 1981 as a non-blocking alternative to 2PC. It adds a third phase: pre-commit.

Phase 1: Prepare (same as 2PC). Phase 2: Pre-commit. If all voted yes, the coordinator sends PRE-COMMIT. Participants acknowledge. Phase 3: Commit. The coordinator sends COMMIT.

The idea is that before the coordinator commits, everyone knows that everyone else voted yes (via the pre-commit acknowledgment). If the coordinator crashes, participants can coordinate among themselves because they all know the vote outcome.

In theory, 3PC is non-blocking. In practice, it has problems. It assumes a synchronous network with bounded message delays. In an asynchronous network (the real world), 3PC can actually violate safety: a network partition can cause one side to commit and the other to abort. This is why you rarely see 3PC in production systems. 2PC with a replicated coordinator is strictly better.

Presumed Abort Optimization

In vanilla 2PC, the coordinator must persist its decision (commit or abort) before sending it to participants. This means a disk write on the critical path for every transaction, even aborted ones.

The presumed-abort optimization says: do not bother logging abort decisions. If a participant asks the coordinator about a transaction and the coordinator has no record of it, the answer is "abort." This is safe because the absence of a commit record means the transaction was never committed.

For commit decisions, the coordinator still writes to its log. But for the common case of aborted transactions (which are frequent in busy systems due to lock conflicts), you save a disk write.

The presumed-commit optimization flips this: do not log commit decisions, and assume committed if no record exists. This is useful when most transactions commit. In practice, presumed-abort is more common because it has simpler recovery semantics.

2PC in the Wild

MySQL XA Transactions. MySQL supports the XA transaction standard, which is essentially 2PC. You can run XA PREPARE to enter the prepared state and XA COMMIT or XA ROLLBACK to complete. The practical problem: MySQL's XA implementation has historically been buggy. Prepared transactions can survive a server restart but sometimes get lost during crash recovery. This has improved in recent versions, but many teams avoid MySQL XA and use application-level compensation instead.

PostgreSQL PREPARE TRANSACTION. PostgreSQL's implementation is more robust. PREPARE TRANSACTION 'txid' persists the prepared state. COMMIT PREPARED 'txid' or ROLLBACK PREPARED 'txid' finishes it. Prepared transactions survive crashes and restarts. The catch: prepared transactions hold locks until resolved. If your application crashes without committing or rolling back prepared transactions, those locks persist until someone manually cleans them up. PostgreSQL intentionally makes PREPARE TRANSACTION visible in pg_prepared_xacts for exactly this reason.

Google Spanner. This is the gold standard for 2PC done right. Every participant in a Spanner transaction is a Paxos group, and the coordinator is also a Paxos group. When the coordinator leader crashes, another member takes over and completes the protocol. When a participant leader crashes, its Paxos group elects a new leader that has the prepare record in its replicated log. There is no window of uncertainty because every decision is replicated before it is acted on. The price: two consensus rounds per transaction (one for the prepare, one for the commit), which means higher latency.

CockroachDB follows a similar design. Distributed transactions use a parallel commit optimization that pipelines the prepare and commit phases: the coordinator writes the commit record to its own transaction record and lets participants commit as soon as they see the record, without waiting for the coordinator to explicitly send COMMIT. This cuts one round trip from the happy path.

Kafka Transactions. Kafka's exactly-once semantics use a variant of 2PC. The transaction coordinator is a special Kafka broker. Producers send messages to multiple partitions within a transaction. The coordinator decides commit or abort and writes a commit marker to each partition. Consumers that read "read committed" skip messages in uncommitted transactions. This is not classic 2PC (there is no prepare phase per partition), but the coordination pattern is the same.

When to Use 2PC (and When Not To)

Use 2PC when you need atomic commits across a small number of tightly coupled data stores. Cross-shard transactions within a single database system (Spanner, CockroachDB, Vitess) are the sweet spot. The participants trust each other, the coordinator can be replicated, and the latency budget accommodates two round trips.

Do not use 2PC across loosely coupled microservices. If your checkout flow needs to update inventory, charge payment, and send a confirmation email, wrapping all three in a 2PC transaction means that an email service outage blocks checkout. Instead, use sagas (a sequence of local transactions with compensating actions) or outbox patterns (write to a local outbox table and process asynchronously).

The rule of thumb: 2PC works within a trust boundary where all participants are managed by the same team and have similar availability targets. Across trust boundaries, use eventual consistency patterns instead. The world learned this lesson the hard way with CORBA and XA transactions in the early 2000s, and it remains true today.

The Atomic Commit Problem

How 2PC Works

The protocol involves a coordinator (the node driving the transaction) and multiple participants (the nodes holding the data).

Phase 1: Prepare

A NO vote is unilateral. The participant has already decided to abort its local portion and does not need to hear back from the coordinator.

Phase 2: Commit or Abort

Each participant receives the decision, applies or rolls back, releases its locks, and acknowledges. Once the coordinator receives all acknowledgments, the transaction is fully complete.

The Decision Point

This decision point is the source of both the power and the weakness of 2PC.

The Blocking Problem

Here is the scenario that makes 2PC controversial.

Participant A votes YES. Participant B votes YES. The coordinator writes "commit" to its log and then crashes before sending the commit message to anyone.

Meanwhile, other transactions that need those locked rows are also blocked. The blast radius can be enormous.

Dealing with Coordinator Failure

Several strategies address the blocking problem:

Three-Phase Commit (3PC): The Theoretical Fix

Skeen proposed 3PC in 1981 as a non-blocking alternative to 2PC. It adds a third phase: pre-commit.

Phase 1: Prepare (same as 2PC). Phase 2: Pre-commit. If all voted yes, the coordinator sends PRE-COMMIT. Participants acknowledge. Phase 3: Commit. The coordinator sends COMMIT.

Presumed Abort Optimization

In vanilla 2PC, the coordinator must persist its decision (commit or abort) before sending it to participants. This means a disk write on the critical path for every transaction, even aborted ones.

For commit decisions, the coordinator still writes to its log. But for the common case of aborted transactions (which are frequent in busy systems due to lock conflicts), you save a disk write.

Architecture

The Atomic Commit Problem

How 2PC Works

Phase 1: Prepare

Phase 2: Commit or Abort

The Decision Point

The Blocking Problem

Dealing with Coordinator Failure

Three-Phase Commit (3PC): The Theoretical Fix

Presumed Abort Optimization

2PC in the Wild

When to Use 2PC (and When Not To)

Key Points

Used By

Common Mistakes

Related

Two-Phase Commit (2PC)

Architecture

The Atomic Commit Problem

How 2PC Works

Phase 1: Prepare

Phase 2: Commit or Abort

The Decision Point

The Blocking Problem

Dealing with Coordinator Failure

Three-Phase Commit (3PC): The Theoretical Fix

Presumed Abort Optimization

2PC in the Wild

When to Use 2PC (and When Not To)

Key Points

Used By

Common Mistakes

Related