Design an Online Auction
Design an online auction platform like eBay. Users list items, place bids in real-time, and the highest bidder wins when the timer expires. Handle concurrent bids through optimistic concurrency, anti-sniping extensions in the final seconds, proxy (auto) bidding, and effectively-once settlement that converges on a single committed winner and a single captured payment across crashes and retries.
Key Topics
Interview Cheat Sheet
60s skim · 3min careful readFour-flow auction platform (Submit, Process, Broadcast, Settle) with two correctness anchors: a Valkey Lua CAS that decides who wins each bid, and a fencing token plus stable idempotency key that guarantees effectively-once settlement. Kafka partitioned by auction_id gives per-auction serialization without locking the database. Anti-sniping and proxy bidding live inside the same CAS script so they're atomic with bid acceptance.
- Bid · 50K/sec global · 100-500/sec hot auction · sub-ms decision
A bid carries the price the user thinks is current. A Lua script in Valkey checks that price still matches, decides if this bid wins, runs anti-snipe and proxy logic atomically, and assigns a sequence number. The whole decision happens in one round trip with no locks.
ClientWebSocket gatewayKafka bids topic (partitioned by auction_id so one auction = one consumer = serial order)Bid ProcessorValkey Lua CAS (check expected_price matches current_price, validate amount ≥ current + min_increment, INCR seq_num, set current_price + high_bidder, anti-snipe re-arm if <30s left, proxy jump-to-price computation)write accepted bid to Postgrespublish to SPUBLISH auction:{id} channel - Broadcast · 1M watchers · sharded Pub/Sub · sub-200ms p99
Once a bid is accepted, the auction's update gets pushed to everyone watching. Sharded Pub/Sub keeps the broadcast on the same Valkey shard that owns the auction key, so a hot auction with 5K watchers doesn't spam the whole cluster.
Bid acceptedSPUBLISH on Valkey shard that owns auction:{id} (sharded variant, not PUBLISH which broadcasts cluster-wide)only WebSocket gateway pods SSUBSCRIBE'd to that shard receive the messagepush WebSocket frame to watchers with seq_numclients on reconnect resume by last-seen seq, gateway backfills missed bids from Postgres - Settle · effectively-once · stable idempotency key
When the timer fires, the auction is marked sold and the winner is charged. A monotonic fence token guards the database write; the payment call uses one idempotency key tied to the auction (not the retry), so a crash mid-settle never double-charges.
Flink keyed timer fires at auction end_timeSettle WorkflowINCR fence:auction_id on Valkeyconditional UPDATE auctions SET status='SOLD' WHERE fence IS NULL OR fence < $tokenStripe Charge with Idempotency-Key: settle-{auction_id} (stable across retries, not per-attempt)on success: write Payment row + Kafka order.createdon failure: workflow retries with same key, Stripe returns idempotent response
- •10M active auctions, 1M concurrent WebSocket watchers
- •50K bids/sec global peak, 1.7K average
- •Hot auction final-minute: 100-500 bids/sec, the per-partition ceiling
- •Anti-snipe: bid in last 30s extends by 2 min, capped at 30 min total
- •Broadcast SLO: sub-200ms p99 from acceptance to watcher frame
- •Per-user rate limit: 10 bids/sec, 200 bids/min
- •Flink keyed timers: 10M live timers checkpointed every 10s to S3
- •Proxy jump-to-price: 1 write ends a cascade that naive ratcheting takes 100s of writes for
- •Optimistic CAS in Valkey Lua, not SELECT FOR UPDATE on Postgres
- •Kafka partition by auction_id, not bid_id, for per-auction serial order
- •Stable idempotency key (settle-{auction_id}), not per-attempt key
- •Sharded Pub/Sub SPUBLISH, not PUBLISH, so messages stay on the auction's shard
- •Flink keyed timers for 10M deadlines, not cron polling or setTimeout
- •Regional pinning of auctions, not global active-active (sub-200ms wins)
- •Idempotency key tied to auction_id, not the retry attempt (the subtle double-charge trap)
- •current_price in Postgres is a hint; MAX(amount) on bids is authoritative (MVCC bloat reason)
- •Valkey hydrate from Postgres is a rehearsed runbook, not a hope
- Valkey Lua CAS over Postgres SELECT FOR UPDATE
Row locking on Postgres serializes through the lock manager. At 500 bids/sec on one hot auction, the lock queue grows, connection pool saturates, p99 spikes to seconds. Valkey CAS is sub-ms because there's no lock manager; serialization is implicit in the single-threaded event loop. Plus updating one Postgres row 500 times/sec creates MVCC bloat and a CDC event flood. Postgres stays as source of truth (MAX(amount) on bids is authoritative; current_price column is a hint), but the hot path lives in Valkey.
- Stable idempotency key (the subtle double-charge trap)
Putting the fence token inside the Stripe Idempotency-Key looks correct and quietly destroys the guarantee. Scenario: settle attempt A grabs fence 42, calls Stripe with key 'settle-abc-42', captures, then crashes before recording success. Kafka redelivers. Attempt B grabs fence 43, calls Stripe with key 'settle-abc-43' (different key!), Stripe sees a new request and captures again. Fix: key is 'settle-{auction_id}' alone, stable across retries. The fence orders writes to our DB; the idempotency key orders side effects at the provider. Same effectively-once pattern as the job scheduler.
- Anti-snipe + proxy jump-to-price (atomic inside the CAS)
Anti-snipe extends the auction by 2 minutes if a bid lands in the last 30 seconds, capped at +30 min total to stop a bot bidding every 29 seconds from extending forever. Proxy jump-to-price: two proxies with maxes $200 and $500 ratcheting by $1 increments produces hundreds of one-cent bids. Instead, on each accepted bid, compute jump_price = min(winner.max, runner_up.max + min_increment), submit one bid at that price. One write ends the cascade. Both run inside the same Lua script so they're atomic with bid acceptance; can't drift relative to current_price.
- Flink keyed timers for 10M deadlines
10M live auction deadlines is the timer-scale problem. setTimeout dies on pod restart. SELECT WHERE end_time < NOW() against 10M rows doesn't scale. pg_cron is single-process. Flink keyed timers checkpoint to S3 every 10s; on JobManager failure the new leader recovers from the checkpoint and replays timers whose deadlines passed during downtime. Anti-snipe extensions re-arm in-place via keyed state, not a new database row. Same primitive that supports the URL shortener's range-allocator pattern: time-keyed state with durability via checkpoint.
40-Minute Interview Playbook
Each phase is what the interviewer expects you to do and say. Concrete steps, not topic hints. The diagrams are what you sketch on the board.
- 15 min
Clarify Requirements and Scale
GoalPin the bid concurrency model, the broadcast latency target, and the settlement guarantee. The whole design pivots on per-auction serialization plus effectively-once settlement, so name both up front.
Do & Say- ASK·1ASK: what happens when two users bid the same amount at the same instant? Accept both? Reject both? Pick one deterministically? Walk to one wins, the other gets OUTBID with the new price. That commits you to per-auction serialization.
- SAY·2Lock the broadcast SLO: watchers see new price within 200ms of acceptance, p99 regional. That rules out polling and forces WebSocket + sharded Pub/Sub. Write 200ms p99 in the corner.
- SAY·3Pin the settlement guarantee: effectively-once. After retries: exactly one SOLD row per auction, exactly one captured charge at the payment provider. Not exactly-once across the payment boundary (Stripe is a separate authority), but effectively-once on our side.
- SAY·4Get the numbers on the board: 10M active auctions, 50K bids/sec peak, 1.7K average, 1M concurrent WebSocket watchers. Hot auctions take 100-500 bids/sec in the final minute. That hot-auction rate is the per-partition ceiling. Write 50K total / 500 hot in the corner.
- SAY·5Define scope. In scope: English auction, anti-sniping, proxy bidding, settlement. Stretch: Dutch, sealed-bid. Out of scope: seller payouts, KYC, fraud ML, search ranking.
Interviewer is grading: You name per-auction serialization (not global) as the concurrency model. You separate the per-auction hot rate from the aggregate rate. Effectively-once vs exactly-once distinction lands early.
- 25 min
API and Data Model
GoalOne bid endpoint with the load-bearing field (expected_price), the auction state hash in Valkey, and the immutable bids table in Postgres. Get the optimistic-concurrency precondition on the board now.
Do & Say- WRITE·1Write the bid endpoint: POST /auctions/{id}/bids {amount, expected_price, idempotency_key}. Returns 202 with bid_id + QUEUED. Terminal result rides the WebSocket as bid_result {ACCEPTED, REJECTED with reason, current_price}.
- SAY·2Call out the load-bearing field: expected_price is what the client thinks current_price is. CAS accepts only if expected_price matches server current_price. Without it, a client at $100 bidding $105 against a server at $108 gets a misleading TOO_LOW. With it, STALE_EXPECTED_PRICE fires and the UI refreshes.
- DRAW·3Sketch the Valkey auction state hash: HSET auction:{id} with fields status, current_price, high_bidder, current_end_time, sequence_num, min_increment. That hash is the CAS target. Lua script reads and updates atomically under Valkey's single-threaded execution.
- DRAW·4Sketch the Postgres bids table: auction_id (partition key), bidder_id, amount, sequence_num, status, idempotency_key, created_at. Partial unique index on (auction_id, sequence_num) WHERE status = ACCEPTED dedupes Kafka redelivery and keeps sequence_num gap-free among accepted bids.
- SAY·5Mention the proxy_bids table: auction_id, bidder_id, max_amount, is_active. The proxy resolver reads this on every bids.accepted to decide if an auto-bid fires. One-line mention; deep dive later.
Interviewer is grading: expected_price appears in the request body, not as an afterthought. The partial unique index on accepted bids is named with its dedup purpose. Auction state lives in Valkey for CAS, Postgres for truth.
- 310 min
High-Level Design (draw the four flows)
GoalOne diagram with four flows visible: Submit, Process, Broadcast, Settle. Correctness lives in two specific places (the Valkey CAS and the settlement fencing token); make those two points stand out.
Draw on the boardDo & Say- DRAW·1Draw four lanes left to right: Submit, Process, Broadcast, Settle. Correctness lives in two places only: Valkey CAS decides who wins each bid and fencing token on the auction row decides whose settlement commits. Everything else is delivery and fan-out.
- SAY·2Walk the Submit lane: API does auth, per-user rate limits (10/sec, 200/min), fast reject if status != ACTIVE or now > current_end_time, produce to Kafka partitioned by auction_id. The API does NOT validate amount. That check belongs inside the CAS, under serialization.
- SAY·3Walk the Process lane: one consumer per Kafka partition, auction_id as partition key keeps all bids for an auction in arrival order. Lua CAS does: read state, check expected_price, check amount >= current + min_increment, INCR sequence_num, update state, return. Atomic under Valkey's event loop.
- SAY·4Walk the Broadcast lane: bids.accepted → fan-out service, SPUBLISH on Valkey sharded Pub/Sub, each WS gateway SSUBSCRIBEs to channels for its watched auctions, writes frames to local watchers. Plain PUBLISH broadcasts to every node; SPUBLISH stays on the shard owning auction:{id}.
- SAY·5Walk the Settle lane: Flink keyed timer per auction, on firing → INCR fence:auction:{id}, read winning bid, conditional UPDATE on auctions guarded by fencing token, capture with Idempotency-Key = settle-{auction_id} (stable across retries).
Interviewer is grading: Four flows clearly separated. The two correctness anchors (CAS and fencing) are named with their roles, not generic 'we use Redis'. SPUBLISH vs PUBLISH distinction lands. auction_id is the partition key, not bid_id.
- 415 min
Deep Dive: CAS, Fencing, and Proxy Bidding
GoalThree sub-dives. The Lua CAS step by step. The fencing-token settlement trap (idempotency key must be stable). Proxy bidding's jump-to-price algorithm so two proxies don't ratchet up by $1 forever.
Draw on the boardDo & Say- SAY·1Pivot to the CAS script. Walk the seven steps: SET bid_result:{bid_id} NX (dedup Kafka redelivery), check status/end_time (else AUCTION_CLOSED), check expected_price==current_price (else STALE_EXPECTED_PRICE), check amount>=current+min_increment (else BID_TOO_LOW), INCR sequence_num + update current_price + high_bidder, extend end_time if in anti-snipe window, cache result.
- SAY·2Defend optimistic concurrency: SELECT FOR UPDATE on the auction row serializes via DB row lock, at 500 bids/sec on one hot auction the lock queue grows, pool saturates, p99 spikes to seconds. Valkey CAS is sub-ms with no lock manager. Serialization is implicit in the single-threaded event loop.
- SAY·3Address auctions.current_price not updating per bid: current_price in Postgres is a hint, not authoritative. Authoritative is MAX(amount) FROM bids WHERE auction_id=? AND status=ACCEPTED. Updating one row 500 times/sec creates MVCC bloat and a CDC event flood. The bids table is append-only, which Postgres handles cleanly.
- SAY·4Pivot to settlement. Walk the duplicate scenario: Pod A grabs fence 42, writes SOLD, calls Stripe with key settle-abc-42, captures $240, pod evicted before final UPDATE. Then: Kafka redelivers, Pod B grabs fence 43, UPDATE succeeds since 42<43, calls Stripe with key settle-abc-43 (new key), Stripe captures again. Double charge.
- SAY·5Name the fix: idempotency key must be tied to the auction, not the attempt. Use Idempotency-Key = settle-{auction_id} stable across retries. Stripe returns the original capture on the second call. The fencing token orders writes, the idempotency key orders side effects. Independent dimensions.
- SAY·6Pivot to anti-sniping: bid arrives within anti_snipe_seconds (e.g., 30s) of end_time, extend by anti_snipe_extend (e.g., 120s), atomic with bid acceptance in the same Lua script, extension relative to current end_time, not original. We emit auctions.end_time_changed so Flink re-arms the keyed timer in place.
- SAY·7Handle the infinite-extension attack: bot bidding every 29 seconds keeps the auction open forever. Two caps: maximum total extension (e.g., 30 min from original end) and per-bidder extension quota. Once hit, the script extends silently or not at all, and the auction closes on schedule.
- SAY·8Pivot to proxy bidding: two proxies with maxes $200 and $500. Naive ratchets by min_increment, hundreds of penny bids until one caps. Fix: jump_price = min(winner.max, runner_up.max + min_increment) and submit one bid. For $200/$500, jump_price = $201. One bid ends the cascade.
- WATCH·9Be ready for the timer-service question: 10M timers in Flink keyed state by auction_id, checkpointed to S3 every 10s, on JobManager failure new leader recovers and replays timers whose deadlines passed during downtime. setTimeout dies on pod restart, pg_cron doesn't scale to 10M moving deadlines.
Interviewer is grading: You can walk the CAS in seven discrete steps. You name the idempotency-key-in-the-key bug as the subtle trap of settlement design, not as a footnote. The proxy-bid jump-to-price math is on the board with a worked example.
- 55 min
Trade-offs, Risks, and Wrap-up
GoalTwo deliberate trade-offs (regional pinning, effectively-once at the payment boundary), name the recoverable failure modes, and close with one sentence that summarizes the four-flow design.
Do & Say- SAY·1Trade-off one: auctions pinned to one region for their lifetime, bid writes not active-active across regions, cross-region failover is a 5-10 min manual operation. We accept that for sub-200ms regional latency. Going active-active would require global Paxos on every bid, which doesn't pencil at 50K/sec.
- SAY·2Trade-off two: effectively-once over exactly-once. Exactly-once across the payment boundary is impossible because Stripe is an independent system with its own retry semantics. We guarantee one SOLD row, one captured-from-our-side, and a stable idempotency key so Stripe returns the original capture on retry. That's the contract.
- SAY·3Name recoverable failure modes: crash after fence INCR pre-PG-write → next attempt gets higher token, same outcome, crash after PG write pre-payment → next sees INITIATED, sends payment with same idempotency key, updates, crash after payment pre-Kafka → next re-reads, sees PAYMENT_CAPTURED, emits. Every stage idempotent.
- SAY·4Mention Valkey loss: Postgres is source of truth. If Valkey dies, hydrate a new instance from Postgres (current_price, high_bidder, sequence_num all derivable via MAX on bids). Hydrate takes minutes for 10M auctions. During hydrate, new bids return 503, watchers stay on last cached state.
- SAY·5Mention seller payouts in one line: settlement is only half the lifecycle, other half is seller_payouts, held during chargeback window (7-30 days), disbursed via Stripe Connect transfers.create, idempotency key payout-{auction_id}. Out of scope today but I can come back to it.
- SAY·6Close in one breath: four flows, Submit produces partitioned by auction_id, Process serializes via per-partition consumer + Valkey CAS, Broadcast uses sharded Pub/Sub, Settle uses fencing token + stable idempotency key for effectively-once. Correctness lives in CAS and the fence. Everything else is delivery.
- SAY·7If there's time, offer: Dutch auction price-ticks, sealed-bid encryption, retraction mid-settlement, or bidder risk tiers + payment-hold sizing.
Interviewer is grading: Regional pinning named as a deliberate choice. Effectively-once defined precisely (not 'kinda exactly-once'). The closing sentence ties four flows and the two correctness anchors together in under 30 seconds.
Interview Grading by Level
What an interviewer at each level expects to see in your answer. Use this to calibrate, not to perform.
Mid-Level Engineer (L4 / SDE-II)
Reaches a queue + worker pattern but treats concurrency as 'use a transaction' without naming optimistic vs pessimistic or the consequences.
- Adds a message queue to serialize bids per auction.
- Knows WebSocket is needed for real-time broadcast and adds it for watcher updates.
- Picks a single bid processor per auction and can explain why double-bid is otherwise possible.
- Adds an idempotency key on the bid endpoint when prompted.
- Mentions an anti-sniping extension as 'bump the end time by 2 minutes if a bid lands late'.
- Concurrency is 'use a database transaction' or 'SELECT FOR UPDATE' without recognizing the lock-contention failure mode on hot auctions.
- Settlement is 'mark as won and charge'. Doesn't address the crash-between-mark-and-charge double-charge risk.
- Proxy bidding is hand-waved as 'auto-bid the minimum'. Doesn't reach the jump-to-price algorithm even when shown the cascade.
- Timer service is 'a cron job that polls for ended auctions'. Doesn't recognize that 10M live timers needs keyed state.
- Broadcast is 'publish to a topic each watcher subscribes to' without addressing the celebrity-auction fan-out scale.
Senior Engineer (L5 / SDE-III)
Drives optimistic concurrency end-to-end, names the fencing token settlement pattern, and quantifies the hot-auction serialization ceiling.
- Names optimistic concurrency via a Lua CAS script and gives the expected_price + amount checks in the right order.
- Partitions Kafka by auction_id and explains why that's what makes per-auction serialization work.
- Volunteers the fencing-token + stable-idempotency-key pattern for settlement, with the duplicate-scenario walkthrough.
- Brings up Flink keyed timers for the 10M-timer scale and rejects setTimeout / cron as not fitting.
- Names anti-sniping extension as atomic with bid acceptance (same Lua script), not a separate step.
- Sizes bid processors from per-partition throughput (~180 bids/sec sequential, batched to 500/sec).
- Addresses the celebrity-auction fan-out with sharded Pub/Sub (SPUBLISH on cluster mode).
- Volunteers the idempotency-key-in-the-key bug only after the interviewer prods on duplicate charges.
- Doesn't bring up the infinite-extension anti-sniping attack until pushed; then proposes 'just cap it' without per-bidder vs total-extension distinction.
- Proxy bidding gets the cascade problem but stops at 'compute jump price'; doesn't name the tied-max tiebreak (earliest created_at).
Staff+ Engineer (L6+)
Treats this as an effectively-once distributed system: argues the correctness anchors explicitly, names the recoverable crash points, and surfaces the operational consequences of regional pinning.
- Names the two correctness anchors precisely: Valkey CAS for who-wins-a-bid, fencing token for whose-settlement-commits. Everything else is delivery.
- Explains effectively-once vs exactly-once across the payment boundary clearly: 'one SOLD row, one captured-from-our-side, Stripe contract makes the capture stable on retry'.
- Walks the recoverable crash points in settlement (crash after INCR, after Postgres write, after payment, after Kafka emit) and shows each next-attempt outcome.
- Pushes back on the requirements: 'do we really need 50K bids/sec globally, or 50K/sec aggregated across 10M auctions where the per-auction ceiling is 500/sec?'
- Names auctions.current_price as a hint, not authoritative, and explains the MVCC-bloat reason. Authoritative is MAX(amount) over the bids partition.
- Volunteers the proxy-bid tied-max tiebreak (earliest proxy_bids.created_at) and the race against auction close (CAS rejects after end_time).
- Closes with the operational angle: 'Valkey hydrate from Postgres is the disaster-recovery rehearsal we own. Cross-region failover is a 5-10 minute manual runbook, not automatic. Regional pinning is a product constraint, not a hack.'
Common Follow-up Questions
click to expandQuestions an interviewer is likely to ask after your walkthrough. Rehearse the short answer.