Distributed Transaction Patterns

The Four Patterns, Honestly

Not all distributed transaction patterns are equal, and the industry has spent a decade learning which ones belong where. Here is what actually works in production, not what looks clean on a whiteboard.

Two-Phase Commit (2PC) gets a bad reputation, but it has a legitimate niche. Within a single vendor's database cluster (PostgreSQL with multiple databases, Oracle RAC, Google Spanner), 2PC works reliably because the coordinator and participants share failure detection and recovery mechanisms. The problems start when you try to run 2PC across heterogeneous systems. A coordinator crash between the prepare and commit phases leaves participants holding locks indefinitely. In a microservices world where "participant" means "another team's service," that lock becomes a production incident.

Saga with Choreography uses events. Each service completes its local transaction and publishes an event. The next service picks it up and continues. No central coordinator. This works well for simple, linear flows: Order Service publishes OrderCreated, Payment Service charges the card and publishes PaymentCharged, Inventory Service reserves stock. Clean. But when the flow branches, when services need to react to multiple events, when you need to answer "what state is this order in right now?" you are assembling the answer from event logs across 5 services.

Saga with Orchestration uses a central coordinator that tells each service what to do and tracks the overall state. Temporal (evolved from Uber's Cadence) is the dominant open-source option here. The orchestrator holds the workflow definition, manages retries, handles timeouts, and persists state durably. The single point of failure concern is real but manageable. Temporal itself runs as a highly available cluster. Netflix, Stripe, and Coinbase all use orchestrated sagas in production.

Transactional Outbox solves a narrower problem: how to atomically update your database and publish an event. Write the event to an outbox table in the same database transaction as your business data. A separate process (Debezium, a polling worker) reads the outbox and publishes to Kafka. This is not a full saga pattern, but it is the building block most sagas need, and it is where most teams should start.

When Step 3 of 5 Fails

Consider an e-commerce checkout: (1) create order, (2) charge payment, (3) reserve inventory, (4) schedule shipping, (5) send confirmation. Shipping fails because the warehouse is at capacity.

With an orchestrated saga, the orchestrator detects the failure and runs compensating transactions in reverse order. Inventory gets unreserved. Payment gets refunded. Order status moves to "cancelled." Each compensation is a new forward transaction, not a database rollback.

The subtle problem: the customer's credit card was charged and then refunded. That is a different experience than never being charged. The authorization hold affects their available credit. The refund takes 3-5 business days to appear. This is why many payment-heavy systems use authorization holds instead of immediate charges. The hold reserves funds without capturing them, and if the saga fails, you simply release the hold instead of issuing a refund.

Choosing the Right Pattern

Start with the Transactional Outbox for reliable event publishing. If your workflows are simple linear chains across 2-3 services, choreographed sagas work fine. Once you have branching logic, human approval steps, long-running processes (hours or days), or more than 3 participating services, invest in an orchestration framework like Temporal. Reserve 2PC for single-vendor database clusters where the coordinator is part of the database engine itself.

The most common mistake is reaching for a saga framework before you need one. If two services need to coordinate, a direct API call with a retry and a compensating endpoint is simpler, easier to debug, and easier to operate than wiring up Temporal or building a choreography layer. Complexity should be proportional to the coordination problem you actually have.

The Four Patterns, Honestly

When Step 3 of 5 Fails

Consider an e-commerce checkout: (1) create order, (2) charge payment, (3) reserve inventory, (4) schedule shipping, (5) send confirmation. Shipping fails because the warehouse is at capacity.

Choosing the Right Pattern

Architecture Diagram

The Four Patterns, Honestly

When Step 3 of 5 Fails

Choosing the Right Pattern

Key Points

Common Mistakes

Related Topics

Distributed Transaction Patterns

Architecture Diagram

The Four Patterns, Honestly

When Step 3 of 5 Fails

Choosing the Right Pattern

Key Points

Common Mistakes

Related Topics