Outbox Pattern: Reliable Event Publishing
Write the event to an 'outbox' table in the SAME transaction as the business write. A separate process polls the table and publishes events to Kafka/queue/webhook. Solves the dual-write problem: a database AND a message broker can't be written to atomically, so events would otherwise be lost or duplicated when one of the writes fails.
Diagram
What it is
The outbox pattern solves the dual-write problem: how can an event be reliably published after writing to the database, given that the database and the message broker are independent systems?
The idea: don't try to write to two systems atomically. Write the event to a table in the same database, in the same transaction as the business write. A separate process reads the table and publishes events to the broker. Now there is one atomic write (the transaction), and the publishing is decoupled from the request path.
This pattern shows up in every production system that publishes events. Outbox, transactional outbox, transactional messaging, same idea, different names.
Why dual-write fails
An order needs to be created: insert into the orders table AND publish an OrderCreated event to Kafka. Two writes, two systems, no atomicity.
Failure modes:
- DB succeeds, broker fails. Order exists; nobody knows about it. Customer sees confirmation; no email is sent, no fulfilment kicks off.
- Broker succeeds, DB fails. Phantom event. Consumers process an order that doesn't exist.
- DB succeeds, broker partially fails. Some downstream subscribers get the event; others don't.
There's no order of operations that fixes this. The outbox pattern recognises that two systems can't be made atomic, so it makes the operation atomic in one system (the DB), then handles publishing as a separate, retryable step.
The relay
The relay is a separate process (or thread, or scheduled job) that polls the outbox table and publishes pending rows. The standard implementation:
loop:
in transaction:
select pending rows for update skip locked, limit 100
for each row: publish to broker
mark rows as published
commit
sleep briefly if no rows
SKIP LOCKED lets multiple relay instances run safely; each grabs a different batch.
The relay is at-least-once by design. If it crashes between publishing and marking-sent, those events get republished on restart. Consumers must be idempotent. Combine with idempotency keys to make this safe.
Outbox vs CDC
Two ways to get changes from the database into the event stream.
Outbox: explicit table. The event payload is written at the time of the business write. Clean event design (the schema is under application control). Extra DB table, extra relay process.
CDC (Change Data Capture): tools like Debezium read the database's WAL and publish each change as an event. No outbox table needed. Every change becomes an event automatically. Trade-offs: less control over the event shape (it's the row), publishes changes that may not be intended as events, requires platform-specific setup (Postgres logical replication, MySQL binlog).
For greenfield: pick based on whether event-design control (outbox) or zero application code overhead (CDC) matters more. Many teams use both: CDC for cross-team event streaming, outbox for application-level workflow events.
Performance considerations
For high-volume systems, the outbox can become hot:
- Index
(published_at IS NULL, created_at)for the relay query. - Partition by date if the table grows.
- Delete published rows (or move to an archive) to keep the active set small.
- Run multiple relay instances with
SKIP LOCKEDto scale throughput.
Relay latency = poll interval (usually 100ms to a few seconds). For lower latency, consider Postgres LISTEN/NOTIFY to wake the relay on insert.
The outbox INSERT must commit in the same transaction as the business write. Without that, the whole point is gone. Consumers must be idempotent because outbox delivery is at-least-once. And the relay becomes an operational responsibility: monitor lag (how many pending rows, how old) and alert on a growing backlog.
For Java, libraries like Spring's transactional event listener with the outbox extension simplify this. For Go, libraries like Watermill or a hand-rolled relay are common. For Python, custom code or workflow engines (Temporal handles this implicitly).
Implementations
Java/Spring version. The @Transactional method does both DB writes; the relay is a separate scheduled job. Idempotency key on the consumer side dedupes any double-publish.
1 @Service
2 public class OrderService {
3 @Transactional
4 public Order create(CreateOrder cmd) {
5 Order o = new Order(cmd);
6 orderRepo.save(o);
7
8 outboxRepo.save(new OutboxRecord(
9 UUID.randomUUID(),
10 "OrderCreated",
11 json.write(new OrderCreatedEvent(o.getId(), o.getCustomerId()))
12 ));
13 return o; // both committed atomically
14 }
15 }
16
17 @Component
18 public class OutboxRelay {
19 @Scheduled(fixedDelay = 500)
20 public void publish() {
21 List<OutboxRecord> pending = outboxRepo.findPending(100);
22 for (OutboxRecord rec : pending) {
23 kafka.send(rec.topic(), rec.id().toString(), rec.payload());
24 outboxRepo.markPublished(rec.id());
25 }
26 }
27 }Key points
- •Dual write problem: 'INSERT order' + 'publish OrderCreated event' is two writes, no atomic guarantee. Either can fail.
- •Outbox: write the event row in the SAME DB transaction as the business write. Either both happen or neither.
- •A relay process polls the outbox, publishes each event to the broker, marks it published (or deletes it).
- •At-least-once: the relay can publish, crash, restart, publish again. Subscribers must be idempotent.
- •CDC alternative: tools like Debezium read the database WAL and publish change events without an outbox table.
Follow-up questions
▸Why not just use a distributed transaction (XA)?
▸What about Debezium / CDC?
▸Does the outbox pattern guarantee exactly-once delivery?
▸How big should the outbox table grow?
Gotchas
- !Forgetting to put the outbox INSERT in the same transaction defeats the pattern
- !Relay marks events 'sent' before actually publishing: events lost on crash
- !Single relay instance: bottleneck and SPOF; run multiple with SKIP LOCKED
- !Letting the outbox table grow unbounded: relay queries slow down, table bloats
- !Consumers not idempotent: at-least-once delivery causes duplicate side effects