Idempotency & Exactly-Once Processing

Exactly-Once Is a Consumer-Side Problem

Every distributed messaging system delivers messages at-least-once under failure conditions. A Kafka broker acknowledges a produce request, but the ack is lost in transit. The producer retries. Now two copies of the message exist in the partition. RabbitMQ, SQS, Pub/Sub, they all have their own version of this story. The message broker cannot solve this for you because it cannot know whether your application successfully processed the message. That knowledge lives on the consumer side.

Exactly-once processing means: no matter how many times a message arrives, the observable side effects happen once. Building this requires explicit deduplication, idempotent operations, or both.

The Stripe Idempotency Pattern

Stripe's approach is worth studying because it handles real money and cannot afford to get this wrong. The pattern has three parts.

Client-generated idempotency key. The client creates a UUID (or deterministic hash) and sends it with every API request. Retries reuse the same key. This is critical: the server cannot generate the key because it does not know whether a request is new or a retry.

Server-side deduplication. On receiving a request, the server checks a store (Redis, DynamoDB, a database table) for the idempotency key. If the key exists and the original request completed, return the stored response. If the key exists and the original request is still in progress, return 409 Conflict. If the key does not exist, proceed with processing and store the result keyed by the idempotency UUID.

TTL on stored results. Stripe expires idempotency records after 24 hours. This bounds storage growth while covering the realistic retry window. For payment systems, consider longer TTLs. The storage cost of keeping a few million small dedup records is negligible compared to the cost of a double charge.

Outbox + CDC for Reliable Event Publishing

The dual-write problem: you need to update a database row AND publish a Kafka event. If you do them sequentially, a crash between the two operations leaves your system inconsistent. If you try to do them in a distributed transaction, you are back to 2PC territory.

The Transactional Outbox pattern sidesteps this. Write the event to an outbox table in the same database transaction as your business data. A CDC tool (Debezium is the standard choice) tails the database's write-ahead log and publishes outbox rows to Kafka. Because the business data and outbox record are in the same transaction, they are atomically consistent. Debezium handles the publishing asynchronously.

The consumer side still needs idempotency. CDC can produce duplicates during connector restarts or rebalances. Each event in the outbox should carry a unique event ID, and consumers should track processed event IDs to skip duplicates.

Kafka's Exactly-Once Semantics: What It Actually Guarantees

Kafka 0.11 introduced idempotent producers and transactional APIs. An idempotent producer uses sequence numbers so the broker deduplicates retried produce requests. Transactional APIs let you atomically write to multiple partitions and commit consumer offsets in the same transaction.

What this gives you: within a Kafka Streams application doing read-process-write entirely inside Kafka, you get exactly-once processing. Messages are consumed, transformed, and produced to output topics with no duplicates and no data loss.

What this does not give you: exactly-once semantics for anything outside Kafka. If your consumer reads from Kafka and writes to PostgreSQL, Kafka's EOS does not help. You need application-level idempotency for the database write. If your consumer calls an external payment API, Kafka cannot un-call that API on a rebalance.

Database-Level Idempotency Tricks

Unique constraints are the simplest tool. Insert a row with a unique idempotency_key column. If the insert fails with a duplicate key error, the operation already happened. This works for pure inserts but not for operations that combine writes with external side effects.

Conditional updates prevent double-application of state changes. UPDATE accounts SET balance = balance - 100 WHERE balance >= 100 AND last_txn_id != 'abc123' ensures the deduction happens at most once for a given transaction ID.

Optimistic locking with version columns catches concurrent modifications. Read the row with its version, do your processing, then update with WHERE version = <read_version>. If another process modified the row in between, your update affects zero rows, and you know to retry with fresh data.

When Idempotency Is Harder Than You Think

Deterministic operations are straightforward to make idempotent. "Set user email to alice@example.com" produces the same result regardless of how many times you execute it. Non-deterministic operations are where things get tricky.

Timestamps vary between retries. If your operation records processed_at = NOW(), retrying it produces a different timestamp. Store the timestamp from the first execution and replay it on retries.

Random IDs generated during processing (confirmation codes, reference numbers) differ on each attempt. Generate them once, persist them with the idempotency record, and return the stored value on retries.

External API calls are the hardest case. You call a shipping provider's API to create a label. The call succeeds but the response is lost. You retry. Now two labels exist. The fix: check whether the external operation already completed before calling again (use the external provider's own idempotency support if available), or accept the duplicate and reconcile later. Stripe, Adyen, and most payment processors support idempotency keys on their APIs precisely because their customers face this problem.

Exactly-Once Is a Consumer-Side Problem

Exactly-once processing means: no matter how many times a message arrives, the observable side effects happen once. Building this requires explicit deduplication, idempotent operations, or both.

The Stripe Idempotency Pattern

Stripe's approach is worth studying because it handles real money and cannot afford to get this wrong. The pattern has three parts.

Architecture Diagram

Exactly-Once Is a Consumer-Side Problem

The Stripe Idempotency Pattern

Outbox + CDC for Reliable Event Publishing

Kafka's Exactly-Once Semantics: What It Actually Guarantees

Database-Level Idempotency Tricks

When Idempotency Is Harder Than You Think

Key Points

Common Mistakes

Related Topics

Idempotency & Exactly-Once Processing

Architecture Diagram

Exactly-Once Is a Consumer-Side Problem

The Stripe Idempotency Pattern

Outbox + CDC for Reliable Event Publishing

Kafka's Exactly-Once Semantics: What It Actually Guarantees

Database-Level Idempotency Tricks

When Idempotency Is Harder Than You Think

Key Points

Common Mistakes

Related Topics