Idempotency Key: Stripe-Style Implementation

What it is

An idempotency-key implementation makes non-idempotent operations safe to retry. The contract: the client sends an Idempotency-Key header (a UUID per logical operation). The server, on first receipt, processes the request and caches the response keyed by the idempotency key. On any subsequent retry with the same key, the server replays the cached response without re-doing the work.

This is the pattern Stripe uses (and documented in detail), and now standard across most modern payment, messaging, and provisioning APIs.

Why this matters

Without idempotency, every network blip during a non-idempotent call risks a duplicate side effect. Customer's network drops; client retries; customer is charged twice. Multiply by the millions of payments per day across the industry, and the failure mode is real money.

With idempotency, a retry of an already-processed request is a no-op (returns the cached response). The client can retry aggressively without fear of duplicate side effects.

The server-side flow

For each request with an Idempotency-Key:

Compute fingerprint of the request body (hash). Used to detect key collisions.
Check cache. If a cached response exists for this key:
- If fingerprint matches: replay the cached response. Return.
- If fingerprint differs: 422 Unprocessable Entity ("idempotency key reused with different body").
Acquire lock on the key. If locked by another request: 409 Conflict ("request in flight").
Process the request through the normal handler.
Store the response (body + status + relevant headers + fingerprint) keyed by the idempotency key, with TTL.
Release the lock.
Return the response.

Steps 2 and 3 are why concurrent retries don't both execute. Step 1 protects against accidental key reuse for different operations.

The client side

Three rules:

One key per logical operation, not per HTTP attempt. Generate the key once when the user clicks "submit", reuse it on every retry.
Persist the key for the duration of retries. If the client process crashes mid-retry, the next attempt should use the same key. Local storage, sessionStorage, server-side draft state, whatever fits the platform.
Send the key on every state-changing request that might retry. POST, PATCH, sometimes PUT. GET is naturally idempotent and doesn't need it.

Cache TTL

How long to keep cached responses?

The lower bound: longer than the maximum retry window. If clients can retry over 24 hours (long-poll, async workflow), TTL must be 24h+.

The upper bound: not too long, or storage grows. 24 hours to 7 days is typical. After that, the same key would be treated as a new request; this is acceptable because nobody is realistically retrying after a week.

Storage and scale

Per-key storage: a few hundred bytes (response body, status, fingerprint, metadata). For a service doing 1000 req/sec with 24h TTL, that's 1000 * 86400 = 86M keys, ~50GB of Redis. Manageable. For higher scale, shard or compress; or use a database with TTL semantics.

Edge cases

The first request fails after the side effect. Worker charged the card; about to write the response to cache; process killed. Lock has TTL, will expire. Client retries; new processor takes the lock; runs the work again; charges the card again. Fix: write the cache before unlocking, OR make the work itself idempotent (handler checks "did I already process this?" via a separate processed-records table).

The cached response has gone stale. Possible if the cache contains a body referencing data that has since changed. Usually fine for create-style operations (the cached response is the resource just created). Tricky for query-style operations (use shorter TTL).

Different requests with the same key (collision). Detect via fingerprint. Reject with 422. Better than silently returning the wrong response.

Follow-up questions

▸What happens with concurrent requests using the same idempotency key?

The lock serialises them. First request acquires the lock, processes, stores the cached response, releases. Concurrent requests either wait briefly (with blocking lock acquisition) or fail with 409 'in flight' (with fail-fast). For a server-side idempotency layer, the 409 response is usually preferred: the client retries with backoff and the second time finds the cached response.

▸Why detect key collisions?

Two clients might accidentally generate the same key (UUID collisions are rare but not zero in long-tail). Or a single client reuses a key for two different operations by mistake (using a session ID instead of an operation ID, say). Without collision detection, the server replays the first response for the second request, which is the wrong response. The fingerprint check catches this and returns 422.

▸Should the cache be local or shared?

Shared. With multiple instances behind a load balancer, the same key might land on different instances. A local cache means each instance dedupes only against its own history; the lock would also need to be shared. Use Redis or similar.

▸What about idempotency for fire-and-forget background jobs?

Use an idempotency key in the job payload. The handler checks 'have I processed key X?' before doing the side effect. Same idea, different transport. Critical for at-least-once queues (SQS, Kafka with at-least-once consumer) where a single logical operation can be delivered multiple times.

What it is

This is the pattern Stripe uses (and documented in detail), and now standard across most modern payment, messaging, and provisioning APIs.

Why this matters

With idempotency, a retry of an already-processed request is a no-op (returns the cached response). The client can retry aggressively without fear of duplicate side effects.

The server-side flow

For each request with an Idempotency-Key:

Compute fingerprint of the request body (hash). Used to detect key collisions.
Check cache. If a cached response exists for this key:
- If fingerprint matches: replay the cached response. Return.
- If fingerprint differs: 422 Unprocessable Entity ("idempotency key reused with different body").
Acquire lock on the key. If locked by another request: 409 Conflict ("request in flight").
Process the request through the normal handler.
Store the response (body + status + relevant headers + fingerprint) keyed by the idempotency key, with TTL.
Release the lock.
Return the response.

Steps 2 and 3 are why concurrent retries don't both execute. Step 1 protects against accidental key reuse for different operations.

The client side

Three rules:

One key per logical operation, not per HTTP attempt. Generate the key once when the user clicks "submit", reuse it on every retry.
Persist the key for the duration of retries. If the client process crashes mid-retry, the next attempt should use the same key. Local storage, sessionStorage, server-side draft state, whatever fits the platform.
Send the key on every state-changing request that might retry. POST, PATCH, sometimes PUT. GET is naturally idempotent and doesn't need it.

Cache TTL

How long to keep cached responses?

The lower bound: longer than the maximum retry window. If clients can retry over 24 hours (long-poll, async workflow), TTL must be 24h+.

Storage and scale

Edge cases

Different requests with the same key (collision). Detect via fingerprint. Reject with 422. Better than silently returning the wrong response.

Follow-up questions

▸What happens with concurrent requests using the same idempotency key?

▸Why detect key collisions?

▸Should the cache be local or shared?

▸What about idempotency for fire-and-forget background jobs?

What it is

Why this matters

The server-side flow

The client side

Cache TTL

Storage and scale

Edge cases

Implementations

Key points

Follow-up questions

Gotchas

Related reading

Idempotency Key: Stripe-Style Implementation

What it is

Why this matters

The server-side flow

The client side

Cache TTL

Storage and scale

Edge cases

Implementations

Key points

Follow-up questions

Gotchas

Related reading