Idempotency & Exactly-Once Processing
Architecture Diagram
Exactly-Once Is a Consumer-Side Problem
Every distributed messaging system delivers messages at-least-once under failure conditions. A Kafka broker acknowledges a produce request, but the ack is lost in transit. The producer retries. Now two copies of the message exist in the partition. RabbitMQ, SQS, Pub/Sub, they all have their own version of this story. The message broker cannot solve this for you because it cannot know whether your application successfully processed the message. That knowledge lives on the consumer side.
Exactly-once processing means: no matter how many times a message arrives, the observable side effects happen once. Building this requires explicit deduplication, idempotent operations, or both.
The Stripe Idempotency Pattern
Stripe's approach is worth studying because it handles real money and cannot afford to get this wrong. The pattern has three parts.
Client-generated idempotency key. The client creates a UUID (or deterministic hash) and sends it with every API request. Retries reuse the same key. This is critical: the server cannot generate the key because it does not know whether a request is new or a retry.
Server-side deduplication. On receiving a request, the server checks a store (Redis, DynamoDB, a database table) for the idempotency key. If the key exists and the original request completed, return the stored response. If the key exists and the original request is still in progress, return 409 Conflict. If the key does not exist, proceed with processing and store the result keyed by the idempotency UUID.
TTL on stored results. Stripe expires idempotency records after 24 hours. This bounds storage growth while covering the realistic retry window. For payment systems, consider longer TTLs. The storage cost of keeping a few million small dedup records is negligible compared to the cost of a double charge.
Outbox + CDC for Reliable Event Publishing
The dual-write problem: you need to update a database row AND publish a Kafka event. If you do them sequentially, a crash between the two operations leaves your system inconsistent. If you try to do them in a distributed transaction, you are back to 2PC territory.
The Transactional Outbox pattern sidesteps this. Write the event to an outbox table in the same database transaction as your business data. A CDC tool (Debezium is the standard choice) tails the database's write-ahead log and publishes outbox rows to Kafka. Because the business data and outbox record are in the same transaction, they are atomically consistent. Debezium handles the publishing asynchronously.
The consumer side still needs idempotency. CDC can produce duplicates during connector restarts or rebalances. Each event in the outbox should carry a unique event ID, and consumers should track processed event IDs to skip duplicates.
Kafka's Exactly-Once Semantics: What It Actually Guarantees
Kafka 0.11 introduced idempotent producers and transactional APIs. An idempotent producer uses sequence numbers so the broker deduplicates retried produce requests. Transactional APIs let you atomically write to multiple partitions and commit consumer offsets in the same transaction.
What this gives you: within a Kafka Streams application doing read-process-write entirely inside Kafka, you get exactly-once processing. Messages are consumed, transformed, and produced to output topics with no duplicates and no data loss.
What this does not give you: exactly-once semantics for anything outside Kafka. If your consumer reads from Kafka and writes to PostgreSQL, Kafka's EOS does not help. You need application-level idempotency for the database write. If your consumer calls an external payment API, Kafka cannot un-call that API on a rebalance.
Database-Level Idempotency Tricks
Unique constraints are the simplest tool. Insert a row with a unique idempotency_key column. If the insert fails with a duplicate key error, the operation already happened. This works for pure inserts but not for operations that combine writes with external side effects.
Conditional updates prevent double-application of state changes. UPDATE accounts SET balance = balance - 100 WHERE balance >= 100 AND last_txn_id != 'abc123' ensures the deduction happens at most once for a given transaction ID.
Optimistic locking with version columns catches concurrent modifications. Read the row with its version, do your processing, then update with WHERE version = <read_version>. If another process modified the row in between, your update affects zero rows, and you know to retry with fresh data.
When Idempotency Is Harder Than You Think
Deterministic operations are straightforward to make idempotent. "Set user email to alice@example.com" produces the same result regardless of how many times you execute it. Non-deterministic operations are where things get tricky.
Timestamps vary between retries. If your operation records processed_at = NOW(), retrying it produces a different timestamp. Store the timestamp from the first execution and replay it on retries.
Random IDs generated during processing (confirmation codes, reference numbers) differ on each attempt. Generate them once, persist them with the idempotency record, and return the stored value on retries.
External API calls are the hardest case. You call a shipping provider's API to create a label. The call succeeds but the response is lost. You retry. Now two labels exist. The fix: check whether the external operation already completed before calling again (use the external provider's own idempotency support if available), or accept the duplicate and reconcile later. Stripe, Adyen, and most payment processors support idempotency keys on their APIs precisely because their customers face this problem.
Key Points
- •There is no exactly-once delivery in a distributed system. Networks lose packets, brokers crash, consumers restart. What you can build is exactly-once processing, and the burden falls entirely on the consumer
- •Stripe's idempotency key pattern (client-generated UUID, server-side dedup with 24h TTL) is the gold standard for API idempotency. Copy it. Seriously. Their engineering blog post from 2017 remains the best practical reference
- •Kafka's exactly-once semantics (EOS) guarantees atomic writes across partitions within a single Kafka cluster. It does not guarantee exactly-once processing in your application. Your consumer still needs idempotency logic
- •The Transactional Outbox pattern with CDC (Debezium reading the WAL) solves the dual-write problem: updating your database and publishing an event atomically without distributed transactions
- •Non-deterministic operations (timestamps, UUIDs, external API calls) are idempotency's hardest edge case. If retrying an operation produces a different result each time, deduplication alone is not enough. You need to capture and replay the original result
Common Mistakes
- ✗Relying on Kafka consumer group offsets for exactly-once guarantees. Consumer commits offset, processes message, crashes before completing side effects. On restart, the message is skipped. You now have data loss, not duplication
- ✗Setting idempotency key TTLs too short. A 1-hour TTL means a client retrying after a network timeout 2 hours later will create a duplicate. Stripe uses 24 hours. Payment systems should consider 48-72 hours
- ✗Implementing idempotency at the API gateway level but not in downstream services. The gateway deduplicates HTTP requests, but internal retries from service mesh, queue consumers, or scheduled jobs bypass the gateway entirely
- ✗Treating database unique constraints as a complete idempotency solution. A unique constraint prevents duplicate inserts, but it does not prevent duplicate side effects like sending emails, calling third-party APIs, or publishing events