CrackingWalnuts

System DesignApril 17, 2026· 83 min read

System Design: Online Auction (50K Bids/sec, Effectively-Once Settlement, Anti-Sniping)

Note

Goal

A real-time online auction platform.

Scale:

10M active listings
50K bids/sec at peak, 1.7K bids/sec average
1M concurrent WebSocket watchers
Sub-200ms regional p99 bid confirmation and broadcast, 99.99% availability (cross-region readers see +100 ms)

Features:

English, Dutch, and sealed-bid auction types
Proxy (auto) bidding
Anti-sniping extension
Effectively-once settlement converging on a single committed winner

Note

TL;DR

The API validates cheaply and writes bids to Kafka partitioned by auction_id, returning 202.
A per-partition bid processor runs an atomic Valkey Lua CAS that accepts or rejects the bid, dedups by bid_id, and assigns a per-auction sequence number.
Accepted bids fan out over Valkey sharded Pub/Sub to WebSocket gateways in under 200 ms.
Flink keyed-timers fire at auction end into a settlement consumer that uses a fencing token plus a stable payment idempotency key to make settlement effectively-once.
Postgres is the source of truth; Valkey is the hot-path coordinator; Kafka is the delivery bus.

Tip

Pick a path

Time	Read	Covers
~10 min	TL;DR + §4	End-to-end flow, who enforces what, where the race conditions live
~30 min	TL;DR, §4, §5, §9, §10	Stack tradeoffs, effectively-once settlement, bid processing model
~60 min	Full post	Every decision plus anti-sniping, proxy resolution, multi-region, ops

Architecture at a glance

Four flows share one state plane. The write side serializes per auction in Valkey. The read side fans out over Pub/Sub. The timer side settles at auction end.

Correctness lives in two places only: the Valkey CAS (who wins a bid) and the fencing token on the auction row (whose settlement commits). Everything else is delivery and fan-out.

1. Problem Statement

An auction platform sounds simple until the last ten seconds of a popular listing.

A rare sneaker auction is ending. 500 users are watching. In the final 30 seconds, 50 bids arrive within a 2-second window. Each must be validated against a price that is changing bid by bid, processed in strict arrival order, and broadcast to all 500 watchers within 200 ms. If two users bid $105 when the current price is $100, only one commits and the other is told "outbid, the price is now $105." Not both. Not neither.

That is the central challenge: strongly serialized writes per auction combined with low-latency fan-out to thousands of readers, all while settlement at auction end is idempotent in the face of crashes.

Four problems drive the design.

Concurrent bids on the same auction. Two users click "Bid $105" at the same instant when the current price is $100. Without concurrency control, both bids pass the "$105 > $100" check and both get accepted. The fix is optimistic concurrency: every bid carries the price it expected to see, and acceptance is conditional on that value still being current. Under Valkey's single-threaded execution, an atomic Lua script gives per-auction serialization for free.

Bid sniping. A user places a bid in the final second, leaving no one time to respond. Some platforms accept this as legitimate strategy. Fairer-outcome platforms (eBay Live, Catawiki) extend the auction by a short window when a bid lands near the end. Anti-sniping is configurable per auction.

Settlement must be effectively-once. When time runs out, the system must converge on a single committed winner after retries settle, and the payment provider must end up with a single captured charge once it acks. A settlement job that crashes between "write SOLD" and "call Stripe" must restart without double-charging. The solution is a fencing token plus target-side idempotency. Standard pattern for any job that terminates with an external side effect.

Real-time broadcast at scale. 1M concurrent WebSocket connections across a fleet of stateless gateway pods. Every accepted bid must reach every watcher of that auction within 200 ms. Polling is not an option. The fan-out path is Valkey Pub/Sub, with each gateway pod only subscribing to channels for auctions its users care about.

Scale targets.

10M active listings at any time
50K bids/sec at peak, 1.7K bids/sec average (30× ratio driven by evening prime-time)
1M concurrent WebSocket watchers
Average auction duration 7 days; minimum 1 hour; maximum 30 days
Bid confirmation and broadcast latency: <200 ms p99
99.99% availability for bid processing

2. Functional Requirements

ID	Requirement	Priority
FR-01	Create auction listings: title, description, images, starting price, reserve price, bid increment, start and end times, auction type	P0
FR-02	Place bids on active auctions with real-time validation against current highest	P0
FR-03	Real-time bid updates pushed to watchers over WebSocket within 200 ms	P0
FR-04	Anti-sniping: extend auction end time by a configurable amount when a bid arrives within the final window	P0
FR-05	Effectively-once settlement: winner determination, reserve check, payment capture	P0
FR-06	English auction: ascending bids, highest wins	P0
FR-07	Dutch auction: price drops on a schedule, first to accept wins	P1
FR-08	Sealed-bid auction: blind bids, revealed at close, highest wins	P1
FR-09	Proxy bidding: user sets a max, system auto-bids the minimum increment on their behalf	P1
FR-10	Watchlist: users subscribe to auctions and receive notifications on key events	P1
FR-11	Bid history: full audit trail of bids per auction	P0
FR-12	Reserve price: sale only completes if final bid meets the seller's hidden minimum	P0
FR-13	Search and browse by category, price range, ending soon, newly listed	P1
FR-14	Bid retraction within policy window	P2

3. Non-Functional Requirements

ID	Requirement	Target
NFR-01	Bid processing throughput	50K bids/sec peak, 1.7K average
NFR-02	Bid confirmation latency (regional p50 / p99)	60 ms / 200 ms (cross-region readers see +100 ms)
NFR-03	Bid broadcast latency (regional p99, acceptance to watcher frame)	<200 ms
NFR-04	Active concurrent auctions	10M
NFR-05	Concurrent WebSocket connections	1M
NFR-06	Bid processing availability	99.99% (52 min/year)
NFR-07	Settlement guarantee	Effectively-once (one SOLD row, one captured charge)
NFR-08	Bid data durability	Zero loss once the API returns 202
NFR-09	Anti-sniping timer precision	<1 s drift
NFR-10	Recovery Time Objective	<30 s for bid processor partition rebalance
NFR-11	Recovery Point Objective	0 for accepted bids
NFR-12	Retention	Bids: hot 90 days in Postgres, archive to S3, drop after 2 years
NFR-13	Geography	Multi-region active reads; bid writes pinned per-auction to a single region
NFR-14	Search latency	<500 ms p99

[3.1] Traffic and workload assumptions

Median bids per auction ~15; mean ~105. The distribution is long-tailed: most listings end quiet, a small fraction of hot listings pull the mean up sharply. Downstream math (§6.1) uses the mean.
3% of auctions end in any given hour during evening prime time.
Hot auctions (top 0.01%) can take 100-500 bids/sec in the final minute.
Payment provider (Stripe-equivalent) supports idempotency keys and 2xx/4xx responses within 1 s p99.
Watchers per auction: average 30, hot auction up to 5K.
Clients resolve a regional endpoint via DNS; the chosen region processes the bid (auction is pinned to its region).

4. End-to-End Architecture

Shape: a per-key-serialized transactional write path with an event-driven fan-out sidecar for reads. Four flows:

Submit (bid write path)
Process (per-auction serial consumer)
Broadcast (WebSocket fan-out)
Settle (auction end to payment)

Each part does one thing. Correctness lives in two places only: the Valkey CAS (who wins a bid) and the fencing token on the auction row (whose settlement commits).

Each flow gets its own diagram under the subsection that describes it. Start with the write path.

[4.1] Submit (write path)

Client → API → Kafka → Bid Processor → Valkey + Postgres → Kafka bids.accepted

When a bid request arrives, the API does a small set of cheap checks and gets out of the way:

Auth, per-user rate limits (10/sec, 200/min), and risk-tier check with payment hold sized to item value (§9.11).
Load the auction summary from Valkey: HMGET auction:{id} status current_end_time auction_type. If missing, fall back to Postgres.
Reject early if status != ACTIVE or now > current_end_time. These are fast rejects that do not enter Kafka.
Produce to Kafka topic bids.incoming, partition key = auction_id. Message body: {bid_id, auction_id, bidder_id, amount, expected_price, idempotency_key, client_ts, server_ts}.
Return 202 Accepted with {bid_id, status: "QUEUED"}.

Important: the API does not validate the bid amount. It does not read the current price. That check happens inside the bid processor, under the Valkey CAS. Validating at the API introduces a race window: by the time the bid reaches the processor, the price has moved.

Kafka is not the source of truth. Bid acceptance is decided by Valkey; durability lives in Postgres.

[4.2] Process (bid processor fleet)

Per-partition consumers. Under steady state, one active processor instance owns each Kafka partition. During a rebalance the assignment can briefly overlap; the Valkey CAS makes the overlap safe (the second attempt sees a moved price and rejects). With auction_id as the partition key, every bid for a given auction lands on the same partition and is processed in arrival order.

For each Kafka message:

Read the message. Do not commit the offset yet.
Run an atomic Lua script against Valkey with two keys: auction:{id} (state hash) and bid_result:{bid_id} (dedup cache). Full script in Appendix A. In outline:
- SET bid_result:{bid_id} <placeholder> NX EX <ttl>. If the key already exists, return the cached result. That is the redelivery dedup.
- Check auction status and end time; reject AUCTION_CLOSED if closed.
- Check expected_price matches current; reject STALE_EXPECTED_PRICE if not.
- Check bid ≥ current + min_increment; reject BID_TOO_LOW if not.
- On acceptance: HINCRBY sequence_num, update current_price + high_bidder, and if inside the anti-snipe window extend current_end_time.
- Cache the final result into bid_result:{bid_id} before returning.
The script is atomic under Valkey's single-threaded execution. No two scripts race on the same key.
Accepted path. a. Write the bid row to Postgres with status = 'ACCEPTED' and the assigned sequence_num. The partial unique index on accepted bids (§7.2) keeps sequence_num gap-free. b. Publish to bids.accepted on Kafka. Downstream consumers are the broadcast gateway, proxy-bid resolver, search indexer, analytics pipe, and notification service. c. If the script returned EXTENDED, also publish to auctions.end_time_changed so the Flink timer service re-arms.
Rejected path. Write the bid row with status = 'REJECTED', sequence_num = NULL, and the rejection reason. Emit a bid_result event over the client's WebSocket carrying {bid_id, status: "REJECTED", reason, current_price, end_time}. The HTTP POST already returned 202 at ingress (§8.1); the terminal outcome always rides the WebSocket.
Commit the Kafka offset. Only after the Postgres write succeeds.

Redelivery dedup. The bid_id and idempotency_key come from the API. If Kafka redelivers the same message after the CAS already ran, the bid_result:{bid_id} NX check short-circuits the script and returns the cached outcome. Without it, the second attempt would see a moved current_price and reject a bid that was actually accepted. TTL is auction_end + 48 h so the cache outlives settlement retries (§17.1).

[4.3] Broadcast (WebSocket fan-out)

Accepted bids have to reach watchers in under 200 ms. Pipeline:

Bid processor publishes bids.accepted to Kafka.
A small fan-out service consumes bids.accepted and does a PUBLISH auction:{id}:updates <payload> on Valkey Pub/Sub. Payload is the bid summary: {sequence_num, current_price, high_bidder_masked, end_time, time_remaining}.
WebSocket gateway pods subscribe to auction:{id}:updates only for auctions their connected users are watching. Each pod keeps a SUBSCRIBE per active auction in its connection pool.
On PUBLISH, each subscribed pod pushes a frame to every local connection watching that auction.

Why Valkey Pub/Sub and not Kafka consumers per pod? Pub/Sub is sub-millisecond per hop, and each gateway pod only subscribes to the ~500-5000 auctions its users actually care about. With Kafka, every pod would consume the full bids.accepted stream and filter client-side, burning CPU and bandwidth.

Cluster-mode note. On Valkey Cluster, plain PUBLISH broadcasts to every node in the cluster, which defeats the point. Use sharded pub/sub (SPUBLISH/SSUBSCRIBE, Valkey 7+) so the message stays on the shard that owns auction:{id}. In a multi-cluster deployment, the pub/sub bus can also run on a separate single-shard Valkey instance to decouple broadcast load from the CAS cluster.

Reconnection story: every push carries sequence_num. On reconnect, the client sends last_seen_seq and the gateway fetches any missing bids from Postgres (SELECT ... WHERE auction_id = ? AND sequence_num > ?) before resuming the live stream. No bids skipped, no duplicates at the client.

Presence and stale subscriptions. Each WebSocket gateway pod tracks its active subscriptions in ws:{pod_id}:subscriptions (§7.5) with a 60 s TTL refreshed on every heartbeat. A reaper runs every 30 s per pod and issues SUNSUBSCRIBE for any channel with no live local connection. On pod crash, the TTL on the pod's subscription set expires within a minute and the broadcast gateway stops publishing to channels no pod serves. Without this, idle SPUBLISH traffic grows unbounded as users navigate away without clean disconnects.

[4.4] Settle (auction end)

Settlement is where duplicates hurt: a double settlement charges the winner twice or picks two winners. The full guarantee chain is covered in §9.

Flink runs a keyed timer service. The key is auction_id. When an auction is accepted or its end time changes, a corresponding timer is re-armed in Flink state. At firing time, Flink emits an auctions.ending event. A settlement consumer picks it up and:

Atomically increment the fencing token in Valkey: token = INCR fence:auction:{id}.
Read the winning bid from Postgres: the highest-amount ACCEPTED bid with the lowest sequence_num as tiebreaker.
Validate reserve price. If not met, mark the auction UNSOLD and stop.

Conditional Postgres write, guarded by the fencing token:

sql

UPDATE auctions
SET status = 'SOLD', winner_id = $winner, final_price = $price,
    settlement_fence = $token
WHERE id = $auction_id
  AND (settlement_fence IS NULL OR settlement_fence < $token)
  AND status = 'CLOSED';

If zero rows update, a later attempt has already won. Stop.

Call the payment provider with Idempotency-Key: settle-{auction_id}. The key is deliberately tied to the auction, not the attempt: a stable key is what lets Stripe / Adyen return the original response on retry. See §9.2 for why including the fencing token in the key breaks the guarantee.
Update settlement_status = 'PAYMENT_CAPTURED'. Emit auctions.sold.

A crashed settlement re-fires. The fencing token blocks stale writes. The idempotency key blocks duplicate charges. Both together give effectively-once settlement.

[4.5] Trace a bid

To anchor the abstract flow, here is one real bid in wall-clock time. A rare watch auction, current price $12,400. User jan clicks "Bid $12,450" at t=0.

Time	Layer	Event
0 ms	Browser	Client sends `POST /auctions/a1b2/bids` with `{amount: 12450, expected_price: 12400, Idempotency-Key: bid-...}`.
4 ms	API gateway	Auth cache hit, rate-limit OK, risk-tier A, preauth hold fires asynchronously (bid is $12,450, above the $1K threshold). Gateway does not block on it for Tier A.
5 ms	API gateway	Produces to `bids.incoming`, partition 137 (`hash("a1b2") % 400 = 137`). Returns 202 `{bid_id, status: QUEUED}`.
18 ms	Bid processor (partition 137)	Consumes message. Runs Lua CAS on `auction:a1b2`. Script reads `current_price=12400`, status=ACTIVE, `expected_price` matches. Increments `sequence_num` to 847, writes new `current_price=12450`, `high_bidder=jan`. Returns `{1, 847, 12450, end_time, OK}`. Time remaining 42 s, outside anti-snipe window.
22 ms	Bid processor	Inserts into `bids` table. Partial unique index `(auction_id, sequence_num) WHERE status='ACCEPTED'` (§7.2) confirms first write.
28 ms	Bid processor	Produces to `bids.accepted`. Commits Kafka offset.
30 ms	Broadcast gateway	Consumes `bids.accepted`. `PUBLISH auction:a1b2:updates` with payload `{seq: 847, price: 12450, high: "jan", end_time, time_left: 42}`.
32 ms	Valkey Pub/Sub	Fans out to 17 WebSocket gateway pods that have subscribed to this auction's channel.
35 ms	Each gateway pod	Writes a frame to every local connection watching this auction. ~500 total watchers, ~30 per pod avg.
65 ms	Watcher client	Receives frame, updates UI. "Outbid" notification fires for the previous high bidder.
140 ms	Client (jan)	Browser receives `bid_result: ACCEPTED, seq: 847` on its WebSocket. UI confirms the bid is live.

The p99 path is 200 ms. This one was 65 ms end-to-end because Valkey, Postgres, and Kafka were all warm and the user was in the auction's home region.

[4.6] Correctness guarantees

Postgres is where truth lives. Valkey is the hot-path coordinator; if it vanishes, a new Valkey is hydrated from Postgres (current_price, high_bidder, current_end_time, sequence_num are derivable from the bids table with a MAX). Hot-start hydrate takes minutes for 10M auctions and is gated by Postgres scan throughput; during that window, new bids reject with 503 and watchers stay on the last cached state. Kafka is the delivery layer and holds no state that isn't also in Postgres.

Protection layers, in order of the bid's lifetime:

API rate limit prevents one user from burying a partition.
Valkey CAS script serializes bids per auction and rejects stale expected_price.
Postgres UNIQUE (auction_id, sequence_num) dedupes Kafka redelivery.
Fencing token on auctions.settlement_fence prevents duplicate settlement commits.
Idempotency key at the payment provider prevents duplicate charges.

The result: effectively-once settlement. Exactly-once is not guaranteed across the payment boundary (the payment provider is the authority on that). What is guaranteed is that only one SOLD row exists per auction and only one capture call is ever committed as "charged."

[4.7] Retraction and cancellation (cross-cutting)

Bid retraction is a legal requirement on many platforms (eBay allows it within rules). It invalidates an ACCEPTED bid and, if that bid is the current highest, forces the auction state to recompute.

Flow:

API writes UPDATE bids SET status = 'RETRACTED' WHERE id = ? AND bidder_id = ? and emits bids.retracted.
A retraction handler runs an atomic Lua script on the auction's Valkey state (same per-key serialization as bid acceptance), reads the top two bids from Postgres, and if the retracted bid was the current high, rolls current_price and high_bidder back within the same script.
Broadcast the correction: PUBLISH auction:{id}:updates <retraction+new_high>.

Retraction rules are business policy, not infrastructure: time windows, max retractions per auction, mandatory reason. Enforcement lives in the API validator.

Auction cancellation (seller withdraws a listing before bids arrive):

UPDATE auctions SET status = 'CANCELLED'
Valkey state hash deleted
Flink timer cancelled
Any in-flight bids reject on the next CAS attempt (the Lua script checks status)

[4.8] What is a "bid"?

A bid is just an intent to pay a price. The system does not care whether it came from a human clicking a button or a proxy agent cascading an auto-bid. From Postgres's view every bid has the same row shape: (auction_id, bidder_id, amount, sequence_num, status, bid_type, created_at).

bid_type routes the bid to one of three origin modes the processor knows:

Mode	Origin	Notes
manual (default)	Human click via API	Carries `expected_price` from client UI
proxy	Auto-bid fired by proxy resolver	Triggered by another user's bid crossing a standing max
dutch_accept	Dutch-auction "accept current price" click	No `expected_price`; price is taken from the scheduled drop

All three modes go through the same Kafka topic, the same CAS script, and the same Postgres writes. Only the caller differs.

Proxy bid cascade. When a bid is accepted, a proxy-bid-resolver consumer reads the proxy_bids table for the auction. If another user's standing max is above the new price, the resolver submits the next bid (current_price + min_increment) on that user's behalf through the same API path. Cascading proxies terminate when only one active max remains above the current price.

From the bid processor's view, every bid is the same row shape. The origin mode only decides who issued it.

[4.9] What this design intentionally avoids

Every system-design deep dive picks a scope. Being explicit about what's out of scope sharpens what's in:

Sub-50 ms bid confirmation. Not an SLO. 200 ms p99 regional is the bar. Going lower means sacrificing durable bid persistence or crossing to a HFT-style architecture, neither of which fits the product.
Global real-time search. Browse is eventually consistent (Elasticsearch via CDC, <60 s lag). A user who places a bid and immediately searches by title may not find the listing for up to a minute. Acceptable.
Peer-to-peer bidding or on-chain settlement. Escrow, KYC, regulatory obligations require a central authority. The platform is the party to every transaction.
Exactly-once across the payment boundary. The payment provider is an independent authority. The platform sends idempotency keys; the provider's contract is what makes captures effectively-once.
Cross-region active-active writes. Auctions are pinned to one region for their lifetime. Multi-region failover is a 5-10 min manual operation, not automatic.
Live bid streaming to anonymous browsers. Watchers authenticate. No WebSocket without an account. Keeps abuse and bot scraping bounded.

[4.10] Store roles

Store	Technology	What it holds	Why it fits
Source of truth	Postgres 17	Auctions, bids, settlements, users	ACID, partitioning, proven at this write volume
Hot state	Valkey 8	Per-auction state hash, CAS target, sequence counter, fencing counter, Pub/Sub channels	Single-threaded Lua = free per-key serialization, sub-ms latency
Event bus	Kafka 4.0 (KRaft)	`bids.incoming`, `bids.accepted`, `auctions.ending`, `auctions.sold`	Partition ordering per `auction_id`, durable replay, mature ecosystem
Timer service	Flink 1.19	Keyed-state timers per auction, settlement pipeline	Exactly-once for internal state and timer firing; payment side effects made effectively-once via fencing + idempotency, not by Flink
Coordination	Postgres advisory lock	Settlement coordinator leader election	No extra service; etcd is the upgrade path if multi-region coordination is needed
Analytics	ClickHouse	Bid history aggregations, seller dashboards, trending	Columnar, fast over billions of rows
Search	Elasticsearch	Auction browse, faceted search, ending-soon lists	Full-text, geo, faceting
Objects	S3	Auction images, archived bid logs	Durable, cheap, CDN-friendly

5. Technology Selection

[5.1] What shape is this system?

The workload is a real-time transactional system with event-driven fan-out. The write path needs strong serialization per auction. The read path needs horizontal broadcast to thousands of watchers. Both together map naturally to CQRS: one write model (bid processor) owns the canonical state; many read models (WebSocket, search, analytics) derive from a single event stream.

"Serialized per auction" does not mean "serialized globally." With 10M auctions active, the global bid rate is 50K/sec, but a given auction sees at most 500/sec. Per-auction serialization is cheap; per-auction database locks across a shared row are not. Valkey's single-threaded execution is the right primitive.

[5.2] The simpler version (don't skip this)

Before building Kafka + Valkey + Flink, ask whether the scale requires it.

Postgres-only variant. Works up to ~500 bids/sec across all auctions.

Accept bids through an API that does SELECT ... FOR UPDATE on the auction row.
Validate the bid against current_price and current_end_time.
Insert into bids, update auctions.current_price, commit.
Broadcast via Postgres LISTEN/NOTIFY to a small fan-out service that pushes over WebSocket.
Settlement via pg_cron firing a SQL function at current_end_time.

Everything in one database. No Valkey. No Kafka. No Flink. The FOR UPDATE row lock serializes per-auction the same way Valkey's single-threadedness does, just with higher latency and a cap on concurrency.

When to graduate. The Postgres-only path falls over when:

Hot auctions exceed ~50 bids/sec (lock contention + connection pool saturation).
WebSocket watchers exceed ~10K (NOTIFY fan-out isn't designed for this).
Peak bid rate exceeds ~500/sec total (database becomes the bottleneck).

At that point, the staged path makes sense: add Valkey for the hot path first, keep Postgres as source of truth. Add Kafka to decouple API latency from processor throughput. Add Flink when settlement complexity outgrows pg_cron.

The rest of this post describes the full-scale version. Most teams building this will not need it on day one.

[5.3] Store selection

Concern	Chosen	Rejected
Source of truth	Postgres 17	CockroachDB (unnecessary global consistency overhead), MySQL (weaker partitioning story)
Hot auction state	Valkey 8	DynamoDB conditional write (5-10 ms vs sub-ms), Redis (license + Valkey is the forked OSS continuation)
Event bus	Kafka 4.0 KRaft	RabbitMQ (no partition ordering at this scale), Pulsar (viable alternative; see note below)
Timer service	Flink 1.19	Quartz (single-node, doesn't survive a crash), pg_cron (doesn't scale past the simpler variant)
Settlement coordinator leader	Postgres advisory lock	ZooKeeper (heavier), etcd (great, but unnecessary second service for single-region)

Pulsar as alternative to Kafka. Pulsar's per-message ack and shared subscriptions remove the per-partition bottleneck. Any number of consumers can share a single topic. Per-auction ordering is still required, which Pulsar's Key_Shared subscription provides without the partition count constraint. The cost is operational weight: BookKeeper dependency, smaller ecosystem, thinner managed offerings. Kafka wins on ecosystem maturity and production track record.

[5.4] Build vs buy

API gateway: build. Off-the-shelf gateways do not enforce the exact validation + idempotency + Kafka produce semantics required here.
Bid processor: build. Core of the system; no vendor substitute exists.
WebSocket gateway: build on a proven framework (Go + gorilla/websocket, or Rust + tokio-tungstenite). Do not hand-roll TCP framing.
Payment: buy. Stripe, Adyen, or equivalent. Never build a card-data system unless payments is the product.
Search: buy. Elasticsearch managed (Elastic Cloud, Opensearch on AWS).
Analytics: buy-or-self-host ClickHouse. ClickHouse Cloud for pure pain-avoidance; self-host when cost dominates.

6. Back-of-the-Envelope

[6.1] Throughput

Active auctions:                   10,000,000
Avg auction duration:              7 days
Completed per day:                 10M / 7 ≈ 1.43M
Mean bids per auction:             ~105 (median is ~15; hot listings pull the mean up)
Avg bid rate: 1.43M × 105 / 86400 ≈ 1,740 bids/sec → round to 1,700 bids/sec

Peak: 30× average driven by evening end-of-auction clustering.
Peak bid rate: 50,000 bids/sec

Hot-auction rate: top 0.01% of auctions in their final minute = 100-500 bids/sec per auction.
Daily volume: 1,700 × 86400 ≈ 147M bids/day, matching the 55B rows/year used in §6.3.

[6.2] Bid processor sizing

One consumer per Kafka partition. Per-message work, sequential: Kafka consume (1 ms) + Valkey Lua script (0.5 ms) + Postgres insert (3 ms) + Kafka produce (1 ms) = 5.5 ms, or ~180 bids/sec per consumer.

Target peak: 50,000 bids/sec
Per-consumer capacity (sequential): 180 bids/sec
Consumers needed: 50,000 / 180 ≈ 280
Round up for headroom and rebalance buffer: 400 partitions, 400 consumers

With batched Postgres inserts (§15.5, 10-50 bids per batch), per-consumer throughput rises past 500/sec. The 400-partition count is a hedge for partition-bound parallelism (§15.4) and rebalance tolerance, not the raw throughput floor. Each pod is tiny: 1 vCPU, 512 MB RAM. KEDA scales on consumer lag.

[6.3] Postgres storage

bids table:
  150M bids/day × 365 days = ~55B rows/year
  Row size: ~250 bytes (ids, amount, fencing/seq, timestamps, status)
  Hot 90 days: 13.5B rows × 250 B = 3.4 TB
  With indexes (2.5×): ~8.5 TB
  Monthly partitions: ~1 TB each
  Archive to S3 after 90 days; drop partition after 2 years

auctions table:
  10M active + ~50M archived per quarter = 60M rows
  Row size: ~2 KB with images JSONB reference (not content)
  Total: ~120 GB. Small relative to bids.

settlements table:
  1M settlements/day × 365 = 365M/year
  Row size: ~400 B
  Annual: 150 GB

users table: 50M × 500 B = 25 GB.

[6.4] Valkey memory

Active auction hash (per auction):
  Fields: current_price, high_bidder, min_increment, current_end_time, status, bid_count, reserve_price, auction_type, anti_snipe_*, sequence_num
  Size: ~500 B per hash
  10M × 500 B = 5 GB

Fencing counters: ~16 B per auction × 10M = 160 MB
Proxy sorted sets: ~500K auctions with active proxies × ~1 KB = 500 MB
Pub/Sub: ephemeral, ~no memory for idle channels
Other overhead: 500 MB

Total working set: ~7 GB
Cluster: 3 primaries + 3 replicas × 16 GB = 96 GB raw, 48 GB primary-side. ~6× headroom on primaries; replicas give the availability target.

[6.5] Kafka

bids.incoming:
  50K msg/sec × 500 B = 25 MB/sec
  400 partitions (one per processor)
  Retention: 24 hours
  Daily volume: 2.2 TB (pre-replication)

bids.accepted:
  ~40K msg/sec × 400 B = 16 MB/sec
  100 partitions (consumed in parallel by broadcast, search, analytics)
  Retention: 7 days
  Weekly volume: ~10 TB

auctions.ending, auctions.sold:
  ~1K msg/sec peak, low volume
  10 partitions each, 24 h retention

Cluster: 6 brokers, 3 TB NVMe each, RF=3

[6.6] WebSocket sizing

1M concurrent connections
Per-pod capacity: 50K connections. The 200K+ figure often quoted for Go + epoll
  assumes light payloads, kernel tuning (somaxconn, file descriptors, tcp_mem),
  terminated TLS at a sidecar, and ~10-20 KB memory per idle connection. Real
  headroom depends on TLS in-process, frame size, and per-connection subscription
  fan-out.
Pods needed: 1M / 50K = 20
Round up for headroom and rolling deploys: 40 pods

Per-bid fan-out cost:
  Avg 30 watchers per auction
  Accepted bid → Valkey PUBLISH → 1-40 subscribed pods → local push to watchers on each pod
  ~40K accepted/sec × 30 avg = 1.2M WebSocket frames/sec across the fleet
  Per pod: 1.2M / 40 = 30K frames/sec. Well within Go's easy envelope.

Hot auction edge case:
  5K watchers on one auction, distributed across all 40 pods
  → 125 watchers/pod avg
  → each pod pushes 125 frames per bid
  At 500 bids/sec on that one auction, each pod pushes 62.5K frames/sec from it.
  Total per pod: 100K frames/sec. Approaching the limit. See §15.2.

[6.7] Growth projections

The design above is sized for today. Three growth scenarios worth planning for:

Horizon	Multiplier	What breaks first	Mitigation
18 months	2× (100K bids/sec peak)	Kafka partition count (400 saturated). Valkey hot-key CPU on top 10 auctions.	Double partitions to 800 (provision up-front, rebalance painful). Shard Valkey cluster from 3 nodes to 6.
3 years	5× (250K bids/sec peak)	Postgres single-primary write rate on `bids` insert. WebSocket fan-out at 5M concurrent connections.	Shard `bids` by `auction_id` range across 4 Postgres primaries. Move WebSocket gateway to edge runtimes that hold stateful TCP (Cloudflare Durable Objects, Fly.io regional VMs). Standard CDN workers do not hold long-lived WebSockets.
5 years	10× (500K bids/sec peak)	The CQRS architecture itself: managing 10+ Postgres shards, 1600 partitions, 10M+ watchers exceeds what a single team can operate.	Split the platform by auction category (electronics, collectibles, vehicles), each a semi-autonomous deployment. Shared user and payment layer.

What to build in now vs later.

Build in: partition count hedge (800 partitions instead of 400), Valkey Cluster (not single-node), structured logging and tracing across all stages.
Defer: Postgres sharding, edge-deployed WebSocket, category splits. All three are >12 months of work; do them when the pain shows up, not on speculation.

The scariest graph in capacity planning is peak bid rate over time. Track it weekly. When the 90th percentile of weekly peaks crosses 70% of current capacity, the 18-month mitigations need to be in flight.

7. Data Model

[7.1] `auctions` (source of truth)

sql

CREATE TABLE auctions (
    id                   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    seller_id            UUID NOT NULL REFERENCES users(id),
    title                VARCHAR(255) NOT NULL,
    description          TEXT,
    category_id          INT NOT NULL,
    auction_type         VARCHAR(20) NOT NULL DEFAULT 'english',
    -- Pricing
    starting_price       DECIMAL(12,2) NOT NULL,
    reserve_price        DECIMAL(12,2),
    min_bid_increment    DECIMAL(12,2) NOT NULL DEFAULT 1.00,
    current_price        DECIMAL(12,2) NOT NULL,
    high_bidder_id       UUID,
    bid_count            INT NOT NULL DEFAULT 0,
    -- Timing
    start_time           TIMESTAMPTZ NOT NULL,
    original_end_time    TIMESTAMPTZ NOT NULL,
    current_end_time     TIMESTAMPTZ NOT NULL,
    anti_snipe_seconds   INT NOT NULL DEFAULT 30,
    anti_snipe_extend    INT NOT NULL DEFAULT 120,
    -- Status + settlement
    status               VARCHAR(20) NOT NULL DEFAULT 'DRAFT',
    settlement_status    VARCHAR(20) DEFAULT 'PENDING',
    settlement_fence     BIGINT,             -- fencing token of latest settlement attempt
    winner_id            UUID,
    final_price          DECIMAL(12,2),
    -- Dutch-specific
    dutch_start_price    DECIMAL(12,2),
    dutch_decrement      DECIMAL(12,2),
    dutch_interval_sec   INT,
    -- Metadata
    region               VARCHAR(16) NOT NULL,  -- write-pinned region
    currency             CHAR(3) NOT NULL DEFAULT 'USD',  -- pinned at creation; no cross-currency bids
    image_urls           JSONB DEFAULT '[]'::JSONB,
    created_at           TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at           TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    CONSTRAINT valid_auction_type CHECK (auction_type IN ('english','dutch','sealed_bid')),
    CONSTRAINT valid_status CHECK (status IN (
        'DRAFT','SCHEDULED','ACTIVE','ENDING_SOON','EXTENDED',
        'CLOSED','SETTLING','SOLD','UNSOLD','CANCELLED')),
    CONSTRAINT valid_settlement CHECK (settlement_status IN (
        'PENDING','IN_PROGRESS','COMPLETED','FAILED','NO_SALE')),
    CONSTRAINT valid_time_range CHECK (start_time < original_end_time)
) PARTITION BY RANGE (created_at);

CREATE INDEX idx_auctions_status_end ON auctions (status, current_end_time)
    WHERE status IN ('ACTIVE','ENDING_SOON','EXTENDED');
CREATE INDEX idx_auctions_settlement ON auctions (settlement_status)
    WHERE settlement_status = 'PENDING' AND status = 'CLOSED';
CREATE INDEX idx_auctions_seller ON auctions (seller_id, status);

[7.2] `bids` (one row per bid, partitioned)

sql

CREATE TABLE bids (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    auction_id      UUID NOT NULL,               -- FK enforced in app layer (cross-partition cost)
    bidder_id       UUID NOT NULL,
    amount          DECIMAL(12,2) NOT NULL,
    previous_price  DECIMAL(12,2) NOT NULL,       -- price seen at CAS time
    sequence_num    BIGINT,                       -- per-auction monotonic; NULL for rejected bids
    status          VARCHAR(20) NOT NULL DEFAULT 'ACCEPTED',
    bid_type        VARCHAR(20) NOT NULL DEFAULT 'manual',
    rejection_reason VARCHAR(32),
    idempotency_key  VARCHAR(128),                -- client-supplied
    server_ts       TIMESTAMPTZ NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    is_proxy        BOOLEAN NOT NULL DEFAULT false,
    proxy_max       DECIMAL(12,2),
    CONSTRAINT valid_bid_status CHECK (status IN ('ACCEPTED','REJECTED','RETRACTED')),
    CONSTRAINT valid_bid_type   CHECK (bid_type IN ('manual','proxy','dutch_accept')),
    CONSTRAINT positive_amount  CHECK (amount > 0),
    -- Partition key included so the constraint holds across partitions (Postgres requirement).
    CONSTRAINT uq_seq  UNIQUE (auction_id, sequence_num, created_at),
    CONSTRAINT uq_idem UNIQUE (bidder_id, idempotency_key, created_at)
) PARTITION BY RANGE (created_at);

CREATE INDEX idx_bids_auction ON bids (auction_id, sequence_num DESC);
CREATE INDEX idx_bids_bidder  ON bids (bidder_id, created_at DESC);
CREATE INDEX idx_bids_accepted ON bids (auction_id, amount DESC)
    WHERE status = 'ACCEPTED';
-- Per-partition partial unique index on accepted bids. A single auction never straddles
-- more than two weekly partitions in practice (max 30-day duration), so gap-free
-- sequence_num is enforced at the app layer by the Valkey CAS and verified by this index.
CREATE UNIQUE INDEX idx_bids_accepted_seq ON bids (auction_id, sequence_num)
    WHERE status = 'ACCEPTED' AND sequence_num IS NOT NULL;

Weekly partitions. Drop old partitions to archive path after 90 days. Rejected bids carry sequence_num = NULL (the Valkey CAS only assigns a sequence on acceptance).

[7.3] `proxy_bids`

sql

CREATE TABLE proxy_bids (
    id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    auction_id     UUID NOT NULL,
    bidder_id      UUID NOT NULL,
    max_amount     DECIMAL(12,2) NOT NULL,
    is_active      BOOLEAN NOT NULL DEFAULT true,
    created_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deactivated_at TIMESTAMPTZ,
    CONSTRAINT positive_max CHECK (max_amount > 0)
);

-- Partial unique index: one active proxy per (auction, bidder). Withdrawn proxies
-- (is_active = false) do not block the user from setting a new one.
CREATE UNIQUE INDEX uq_active_proxy ON proxy_bids (auction_id, bidder_id)
    WHERE is_active = true;
CREATE INDEX idx_proxy_active ON proxy_bids (auction_id)
    WHERE is_active = true;

[7.4] `auction_settlements`

sql

CREATE TABLE auction_settlements (
    id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    auction_id       UUID NOT NULL UNIQUE,
    winner_id        UUID,
    final_price      DECIMAL(12,2),
    reserve_met      BOOLEAN NOT NULL DEFAULT false,
    fencing_token    BIGINT NOT NULL,           -- INCR fence:auction:{id}
    status           VARCHAR(20) NOT NULL DEFAULT 'INITIATED',
    payment_id       VARCHAR(128),
    payment_status   VARCHAR(20) DEFAULT 'PENDING',
    idempotency_key  VARCHAR(128) NOT NULL,    -- settle-{auction_id}  (stable across retries; see §9.6)
    auction_ended_at TIMESTAMPTZ NOT NULL,
    settled_at       TIMESTAMPTZ,
    payment_at       TIMESTAMPTZ,
    CONSTRAINT valid_settlement_status CHECK (status IN (
        'INITIATED','WINNER_CONFIRMED','PAYMENT_AUTHORIZED',
        'PAYMENT_CAPTURED','COMPLETED','FAILED','NO_SALE'))
);

CREATE INDEX idx_settlements_status ON auction_settlements (status)
    WHERE status NOT IN ('COMPLETED','NO_SALE');

[7.5] Valkey key patterns

auction:{id}                  HASH  current_price, high_bidder, min_increment,
                                    current_end_time, status, bid_count,
                                    reserve_price, auction_type,
                                    anti_snipe_seconds, anti_snipe_extend,
                                    sequence_num

fence:auction:{id}            INT   settlement fencing counter

bid_result:{bid_id}           STRING cached CAS result for Kafka-redelivery dedup;
                                    TTL = auction_end + 48 h (see §17.1)

auction:{id}:proxies          ZSET  score=max_amount, member=bidder_id

auction:{id}:updates          PUB/SUB channel for bid broadcast
                                    (SPUBLISH/SSUBSCRIBE on Valkey Cluster)

user:{id}:watching            SET   auction_ids the user is watching

ws:{pod_id}:subscriptions     SET   auction_ids this gateway pod is actively subscribed to
                                    (written by the gateway on subscribe, read by ops tooling)

rate:bid:{user_id}            STRING counter with TTL (rate limit)

[7.6] Entity-relationship diagram

[7.7] Auction lifecycle

8. API Design

[8.1] Place a bid

POST /api/v1/auctions/{auction_id}/bids
Authorization: Bearer <token>
Idempotency-Key: bid-20260419-user123-a1b2c3d4-10500
Content-Type: application/json

{ "amount": 105.00, "expected_price": 100.00 }

The API returns immediately with {bid_id, status: "QUEUED"}. The client subscribes to its WebSocket to receive the final ACCEPTED or REJECTED event referenced by bid_id. Acceptance also gates on bidder risk tier and required hold (§9.11); failure returns REQUIRES_DEPOSIT.

On rejection:

json

{ "bid_id": "...", "status": "REJECTED",
  "reason": "STALE_EXPECTED_PRICE",
  "current_price": 107.00, "min_next_bid": 108.00,
  "end_time": "2026-04-19T20:00:30Z" }

[8.2] Set a proxy bid

POST /api/v1/auctions/{auction_id}/proxy-bids
{ "max_amount": 500.00 }

DELETE /api/v1/auctions/{auction_id}/proxy-bids    # withdraw

Proxy bid submission immediately fires a real bid if the current price × min_increment is below max_amount.

[8.3] Create auction

POST /api/v1/auctions
{
  "title": "...", "description": "...", "category_id": 42,
  "auction_type": "english",
  "starting_price": 1.00, "reserve_price": 100.00,
  "min_bid_increment": 1.00,
  "start_time": "...", "end_time": "...",
  "anti_snipe_seconds": 30, "anti_snipe_extend": 120,
  "images": ["..."]
}

[8.4] WebSocket protocol

Client connects to wss://<region>.auction.example/v1/ws, authenticates, and subscribes to channels:

→ { "op": "subscribe", "auction_ids": ["a1", "a2"] }
→ { "op": "unsubscribe", "auction_ids": ["a1"] }
→ { "op": "resume", "auction_id": "a1", "last_seen_seq": 23 }

← { "type": "bid", "auction_id": "a1", "seq": 24, "price": 105.00,
     "high_bidder": "jan", "end_time": "...", "extended": false }
← { "type": "bid_result", "bid_id": "b1", "status": "ACCEPTED", "seq": 24 }
← { "type": "bid_result", "bid_id": "b1", "status": "REJECTED",
     "reason": "STALE_EXPECTED_PRICE", "current_price": 107.00 }
← { "type": "auction_closed", "auction_id": "a1", "result": "SOLD",
     "winner": "jan", "final_price": 240.00 }

On reconnect, the client sends resume with the last sequence it saw. The gateway fetches missing bids from Postgres and replays them before resuming the live stream.

[8.5] Ops endpoints

GET  /api/v1/auctions/{id}                     current state (served from Valkey with Postgres fallback)
GET  /api/v1/auctions/{id}/bids?cursor=...     paged bid history (Postgres)
GET  /api/v1/auctions?category=&ending=soon    browse via Elasticsearch
POST /api/v1/auctions/{id}/bids/{bid_id}/retract   policy-gated

[8.6] Image and media pipeline

Auction images dominate the object-store footprint and the CDN bill. Pipeline:

Seller uploads direct to S3 via presigned PUT. The API only returns the URL; bytes never touch application servers.
On s3:ObjectCreated, a Lambda generates three thumbnail sizes (200px, 600px, 1200px WebP) and a blurhash string. Thumbnails write to a public bucket behind CloudFront.
Moderation runs in parallel: an async worker calls a vision model (AWS Rekognition or equivalent) for NSFW + weapon + known-counterfeit signals. Flagged images block the auction from publishing until human review.
A perceptual-hash (pHash) is computed and indexed. On create, the hash is compared against a stolen-listing denylist and against the seller's own past listings (duplicate-image reuse is a common fraud signal).
auctions.image_urls stores the S3 keys; the client builds CDN URLs with a signed short-TTL token for listings under legal hold.

Retention: auction images are kept 7 years to satisfy dispute windows; archived to Glacier after 90 days post-settlement.

9. Settlement, Payouts, and Risk

Settlement correctness (§9.1-9.10), deposit policy (§9.11), account lifecycle (§9.12), and seller payouts (§9.13) all hang off the same auction-end event.

[9.1] Core idea

Retries are fine. Double charging is not. Settlement is allowed to run more than once; it is not allowed to commit more than once. One SOLD row after retries settle, one captured payment once the provider acks. Three layers make duplicate runs harmless: fencing tokens, conditional writes, and provider-side idempotency keys.

[9.2] Real-world duplicate scenario

A settlement consumer fires on auctions.ending. It:

Acquires fencing token 42 via INCR fence:auction:{id}.
Writes the winner + settlement_fence = 42 to Postgres.
Calls Stripe with Idempotency-Key: settle-abc-42. Stripe captures $240.
Updates settlement_status = 'PAYMENT_CAPTURED'.

Between step 3 and step 4, the consumer pod gets evicted. Kafka redelivers. A second consumer picks up:

Acquires fencing token 43.
Writes the winner + settlement_fence = 43 to Postgres (42 < 43, so the conditional UPDATE succeeds).
Calls Stripe with Idempotency-Key: settle-abc-43. New idempotency key. Stripe would capture again.

This is the subtle trap. Using the fencing token in the idempotency key breaks the guarantee. The key must be stable across retries, tied to the auction, not the attempt.

[9.3] Why this isn't exactly-once

Exactly-once across the payment boundary is impossible: the payment provider is an independent system with its own retry semantics. The platform can guarantee that only one capture is ever considered settled from its side of the boundary, and that the idempotency key is stable so Stripe returns the original charge on re-attempt.

[9.4] Settlement flow (order matters)

Correct version of the flow:

token = INCR fence:auction:{id} in Valkey. This is the tiebreaker.
Read the winning bid from Postgres: SELECT id, bidder_id, amount FROM bids WHERE auction_id = ? AND status = 'ACCEPTED' ORDER BY amount DESC, sequence_num ASC LIMIT 1.
If no winning bid or amount < reserve_price, mark UNSOLD and stop. No payment call.
INSERT INTO auction_settlements (auction_id, winner_id, final_price, fencing_token, status, idempotency_key) VALUES (?, ?, ?, $token, 'INITIATED', 'settle-{auction_id}') with ON CONFLICT (auction_id) DO UPDATE SET fencing_token = EXCLUDED.fencing_token, status = 'INITIATED' WHERE auction_settlements.fencing_token < EXCLUDED.fencing_token. The idempotency key is derived from auction_id only. Stable across retries.

Conditional UPDATE on auctions:

sql

UPDATE auctions
SET status = 'SOLD', winner_id = $winner, final_price = $price,
    settlement_fence = $token
WHERE id = $auction_id
  AND (settlement_fence IS NULL OR settlement_fence < $token)
  AND status = 'CLOSED';

Zero rows updated means a newer attempt has already committed. Stop immediately.

Call the payment provider with Idempotency-Key: settle-{auction_id}. Stripe returns the original capture on repeat.
On provider success, UPDATE auction_settlements SET status = 'PAYMENT_CAPTURED', payment_id = ?, payment_at = NOW() WHERE auction_id = ? AND fencing_token = $token.
Emit auctions.sold.

The fencing token orders the writes. The idempotency key is independent of the token.

[9.5] Why the CAS alone isn't enough

Settlement runs across Postgres and the payment provider, not inside Valkey. Between the "read winner" and "write SOLD" steps, an earlier frozen coordinator can wake up and race the current one. The Valkey CAS protects the auction state hash, not the Postgres settlement state. Fencing is the monotonic counter that makes "later wins" deterministic across systems.

[9.6] External idempotency (payment provider)

Stripe, Adyen, and similar systems accept an Idempotency-Key HTTP header and remember responses for 24 hours. Repeated calls with the same key return the original response, including the PaymentIntent ID.

Key pattern: settle-{auction_id} where the auction_id is a UUID. Never includes timestamps, tokens, or retry counts. Must be stable so the second attempt gets the same response the first attempt started.

[9.7] Retraction mid-settlement

If a high bidder retracts their bid after the auction closes but before settlement writes SOLD, the settlement query (§9.4 step 2) picks up the next-highest ACCEPTED bid. Retractions during the CLOSED → SETTLING window are rejected by the API to close the race between INCR fence and the settlement's winner read; once the auction enters SETTLING, retraction is not possible. Retraction is also not allowed after the settlement state reaches PAYMENT_AUTHORIZED or later.

[9.8] Where duplicates are handled

Stage	Duplicate source	Defense
Kafka `bids.incoming` delivery	At-least-once + rebalance	`bid_result:{bid_id}` NX cache inside Lua (§17.1); partial unique index on `(auction_id, sequence_num) WHERE status = 'ACCEPTED'`
Bid client retry	Network blip	`UNIQUE (bidder_id, idempotency_key, created_at)` on bids
Settlement retry	Coordinator crash	Fencing token + conditional UPDATE
Payment retry	Settlement retry	`Idempotency-Key: settle-{auction_id}` (stable across retries)
WebSocket push redelivery	Pub/Sub fan-out bug	Client dedupes by `sequence_num`

[9.9] Settlement is not atomic

The settlement pipeline touches Valkey (fence), Postgres (auction + settlement rows), Kafka (auctions.sold), and the payment provider. No distributed transaction binds them. A crash between any two produces a known recoverable state:

Crash after INCR, before Postgres write → next attempt acquires a higher token, same outcome.
Crash after Postgres write, before payment → next attempt sees a committed settlement with status = 'INITIATED', sends the payment call (same idempotency key), updates status.
Crash after payment, before Kafka emit → next attempt re-reads the settlement row, sees status = 'PAYMENT_CAPTURED', and emits.

Every stage of the pipeline is idempotent in terms of its side effect.

[9.10] Guarantees and non-guarantees

Guaranteed:

A single auctions row ends in status SOLD per auction (after retries settle).
A single auction_settlements row per auction, carrying the highest fencing token observed.
At most one captured charge at the payment provider (assuming the provider honors its idempotency contract).
No accepted bid is ever lost.

Not guaranteed:

Strict wall-clock bound on settlement time. Settlement is best-effort low-latency (sub-second under normal conditions) but can take minutes under backpressure.
Atomicity across the payment boundary. If the payment provider captures the charge but the platform fails to persist PAYMENT_CAPTURED, a reconciliation job corrects it.

[9.11] Bidder risk tiers and payment holds

Correctness mechanisms above stop double-charges. They don't stop a winner from walking away. The defense is a hold or deposit taken before the bid is even accepted, sized to the auction value and the bidder's history.

Hold by item value.

Value	Pre-bid action
< $100	None. Card on file at signup is enough. Charge post-win.
$100-$1000	Card verification + saved payment token. No hold.
$1000-$10K	Preauth hold for the bid amount. Released on outbid.
> $10K	Refundable deposit (5-10%) + KYC. Manual review for new accounts.

Hold by bidder tier.

Tier	Who	Action
A	>10 settled wins, no chargebacks, account >90 days	Default policy by item value (above)
B	New account or <3 wins	One tier stricter than the value table says
C	Past chargeback, dispute, or fraud flag	Deposit always required; manual approval over $1K

users.risk_tier is recomputed nightly from a small model over payment + dispute history. The tier check happens at the API before the bid hits Kafka. A failed hold returns REQUIRES_DEPOSIT and the UI prompts the user.

Tradeoff. Every dollar of friction costs conversion. New-user deposit thresholds are tuned quarterly against chargeback rate, not set once.

[9.12] Account deletion mid-auction

User deletion requests interact badly with an append-only bid log and a settlement that happens days later. Two cases.

Bidder deletes account while holding high bid.

Hard-delete would orphan the auction (no winner_id to pay). Solution:

Deletion is soft while the user has active bids or pending settlements: users.status = 'PENDING_DELETION'. The account cannot log in, cannot place new bids, but the user_id remains valid for settlement.
Pending settlements proceed normally. The charge fires against the last-known payment method.
After the final open settlement completes, a janitor job anonymizes the record (email, name, address → hashed or nulled) but keeps user_id for audit. The bid row keeps the anonymized bidder_id. Audit log preserved, PII gone.
GDPR Article 17 "right to erasure" is satisfied by the anonymization step. Article 17(3)(b) exempts data retained for legal-defense purposes; bid history qualifies.

Seller deletes account mid-auction.

Active auctions are cancelled with status = 'CANCELLED_BY_SELLER_DELETE'. Bidders are refunded any held funds and notified. No settlement runs. Completed auctions where payout is pending continue to the seller's bank account on file (payout is a separate post-settlement flow; deleting the user does not cancel money owed).

Banned users. users.is_banned = true short-circuits bid acceptance at the API. Mid-auction bans are rare; when they happen, the user's current high bids are retracted (§4.7 retraction flow) and the auction recomputes.

[9.13] Seller payout flow

The post so far stops at "charge the winner." The seller's side of the ledger needs its own lifecycle.

Stage	Trigger	What happens
Pending	`auctions.sold` emitted	`seller_payouts` row created: `amount = final_price - fees`, `status = 'PENDING_HOLD'`, `available_at = now + chargeback_window` (7 days default, 30 days for new sellers).
Held	Continuous	Funds sit in the platform's Stripe Connect balance. Winner has dispute rights during the hold.
Disbursed	Cron job after `available_at`	Transfer via Stripe Connect `transfers.create` to seller's connected account. Idempotency key: `payout-{auction_id}`.
Clawback	Chargeback within hold window	Payout cancelled. `seller_payouts.status = 'CLAWED_BACK'`. Seller account balance shows negative if funds were partially released.
Settled	Payout completed + no dispute	`status = 'SETTLED'`. 1099-K tax form generated at year-end if US seller crosses threshold.

Failure: seller bank account closed. Stripe Connect transfers.create returns account_closed. The payout is marked FAILED, the seller is emailed, and the platform holds the balance until a new account is added.

Failure: seller deletes account after sale. Funds owed are held for 12 months per most platform T&Cs; if unclaimed, they are escheated per state law (US) or become platform revenue per EU terms.

Tax. Sales tax / VAT is computed at settlement by a tax engine (Avalara or TaxJar) based on buyer location and item category. The tax portion is captured into a separate platform ledger and remitted quarterly. The seller does not see the tax on their payout.

Scale. At 10M active auctions and ~1M settlements/day, payouts are batched. One transfers.create per auction is too chatty for Stripe's rate limits. Batch payouts daily per seller: one transfers.create per seller per day covering all settled auctions. Cuts payout API calls by ~100×.

10. Bid Processing Model

[10.1] Why optimistic concurrency, not pessimistic

Pessimistic locking (SELECT ... FOR UPDATE on the auction row) serializes bids by holding a database row lock for the duration of the transaction. At 500 bids/sec on a hot auction, the lock wait queue grows, connection pool saturates, p99 latency spikes to seconds. PostgreSQL is not designed to hold a hot row for hundreds of concurrent transactions.

Why auctions.current_price is not written on every bid. The bids table is append-only: every accepted bid is a new row on a partitioned table, which Postgres handles cleanly at tens of thousands of inserts per second. Updating auctions.current_price on every bid would move the same contention back onto one row: 500 UPDATEs/sec on one hot row produces lock queueing, MVCC bloat, and index hot-pages, plus a flood of CDC events through Debezium. The design treats auctions.current_price as a hint refreshed occasionally (or not at all), and treats MAX(amount) FROM bids WHERE auction_id = ? AND status = 'ACCEPTED' as the authoritative value when it matters (settlement, hydrate, search indexing).

Optimistic concurrency (Valkey CAS via Lua) flips the model: each bid checks the current price as a precondition and succeeds or fails atomically in sub-millisecond time. Valkey's single-threaded execution means no lock manager is involved. Serialization is implicit in the event loop.

[10.2] The CAS script, step by step

The script at §4.2 step 2 is the entire acceptance logic in one atomic operation. Steps in order:

Read current state from the auction hash.
Check auction is ACTIVE and now ≤ current_end_time.
Check bid ≥ current + min_increment.
Check expected_price matches current (this is the stale-view rejection).
Increment sequence_num.
Update current_price, high_bidder, current_end_time (extension).
Return the accept/reject result with the authoritative state.

All seven steps execute without interruption. No other CAS can see a half-applied state.

Why the expected_price check matters: it is the precondition that lets the client UI show a fresh view without the server pretending the client's view is current. If a bid at $105 arrives and the price has moved to $108, the bid is not just "too low"; it's operating on a stale view. Returning STALE_EXPECTED_PRICE with the current value lets the UI update and the user decide whether to rebid.

[10.3] Proxy bid resolution

Proxy bidding is an auto-bid-until-max agent. The resolver computes a jump-to price that ends the race in a single bid rather than cascading $1 at a time.

Algorithm on every bids.accepted for auction A:

Load active proxy_bids for A where max_amount > current_price.
Exclude the current high_bidder's own proxy.
If none remain, stop.
Let winner = the proxy with the highest max_amount; runner_up = second highest (or the non-proxy current_price if only one proxy remains).
jump_price = min(winner.max_amount, runner_up.max_amount + min_increment).
Submit one bid at jump_price on winner's behalf through the same Kafka → CAS → Postgres path.

Worked example. Two proxies with maxes $200 and $500 on an auction starting at $10:

A user bids $10 to enter. bids.accepted fires.
Resolver picks user-500 (highest max) vs user-200 (runner-up).
jump_price = min($500, $200 + $1) = $201.
User-500's proxy submits one bid at $201. Accepted. Done.
Final state: price $201, high_bidder = user-500. One Kafka message, one CAS, one broadcast.

Edge cases:

Two proxies with the same max. Both compute the same jump_price = max. Tie is broken by earliest proxy_bids.created_at: that user's bid is submitted first, the later proxy's max no longer exceeds the new current_price, and resolution terminates.
Proxy set mid-auction. If the new max is above the current price, the resolver runs with the new proxy as a candidate. Otherwise the proxy is dormant until someone else bids.
Proxy withdrawal. UPDATE proxy_bids SET is_active = false. The resolver checks the partial index (§7.3) before firing.
Late incoming manual bid during resolution. The CAS expected_price check rejects the resolver's bid if a manual bid landed first; the resolver re-runs on the next bids.accepted.
Resolver racing the auction close. A proxy bid submitted just before current_end_time elapses: the Lua script checks now > current_end_time and rejects with AUCTION_CLOSED. The proxy does not fire a "would have won" bid after the close. The cascade terminates cleanly at the timer firing.
Resolver in flight when timer fires. Flink's timer emits auctions.ending and the settlement consumer starts reading winners. If a resolver bid lands after status = SETTLING, the Lua script rejects on status check (status is no longer ACTIVE). Serialization is implicit: bid acceptance and settlement read Postgres/Valkey state atomically via the CAS + fencing token.

[10.4] Dutch auction

Price starts high and drops on a schedule. First bidder to "accept" wins at the current price.

Base flow.

Flink emits a price-drop tick every dutch_interval_sec that runs an atomic script to reduce current_price by dutch_decrement.
The price-drop script also publishes to auction:{id}:updates so watchers see the drop.
An "accept" bid goes through the same CAS path. The script's expected_price check prevents accepting a stale price (if the price dropped between the client's render and the click, the CAS rejects and the UI updates).

Dutch with proxy. Users can set a buy-at-or-below limit. When the scheduled price drop crosses the limit, the proxy resolver fires an "accept" on their behalf. If two users have the same limit, the one whose proxy record was created first wins (earliest proxy_bids.created_at). The CAS makes ties deterministic: exactly one accept commits, the other sees a moved price.

Dutch end behavior. If no one accepts before the floor price, the auction closes UNSOLD. Most Dutch auctions set a floor equal to the seller's reserve; it's unusual to drop below reserve.

Why this matters. Dutch is structurally different from English because the platform is the price-mover, not the bidders. That means the CAS sees many more writes (every price tick is a write) but fewer acceptances. Valkey CPU stays low because tick writes are simple HSETs, not full scripts.

[10.5] Sealed-bid variants

All sealed-bid auctions share one property: bids are blind until close. The winner-selection logic differs.

First-price sealed-bid (default). Highest bid wins at their bid amount. Simple. Encourages underbidding because bidders try to guess what others will bid.

Vickrey (second-price). Highest bid wins but pays the second-highest bid amount. Theoretically removes the underbidding incentive: truthful bidding becomes the dominant strategy. Used at Google for AdWords auctions and some government bond sales. Settlement logic changes by one line:

sql

-- First-price: pay your own bid
SELECT amount FROM bids WHERE auction_id = ? AND status = 'ACCEPTED'
ORDER BY amount DESC LIMIT 1;

-- Vickrey (second-price): pay the runner-up's bid
SELECT amount FROM bids WHERE auction_id = ? AND status = 'ACCEPTED'
ORDER BY amount DESC LIMIT 1 OFFSET 1;

Hybrid sealed-English. Bids are blind for the first 90% of auction duration; the final 10% reveals the high bid and proceeds as a normal English auction. Combines sealed-bid's price-discovery with English's competitive ending. Bid type transitions at now > reveal_time; the CAS script branches on auction_type IN ('sealed_bid', 'hybrid').

Storage and privacy. Sealed-bid amounts are encrypted client-side with a seller-provided key so the platform cannot observe bids pre-close. This is a trust-model choice; encryption adds ~20 ms per bid and is often skipped for low-stakes auctions. For high-value sealed bids (government tenders, rare items), encryption is non-negotiable.

Tiebreak. Identical high bids: earliest sequence_num wins. This is also how English handles ties, but sealed-bid sees them more often because bidders don't see each other's moves.

11. Timers and Anti-Sniping

Auction end is a distributed-clock problem. The timer service decides when to close an auction; anti-sniping decides whether to extend when a bid lands near the end. Both share the same Flink job.

[11.1] Timer service

Flink's keyed-timer service is the authoritative auction-end clock. It consumes auctions.created and auctions.end_time_changed, re-arms a timer per auction, and fires into auctions.ending when the timer elapses.

Why keyed timers at this scale. 10M active auctions means 10M timers live at any moment, with roughly 1.4M firing per day and thousands more re-arming every minute as anti-sniping extensions land. Flink partitions timer state by auction_id across operator instances, so 10M timers distribute across the cluster instead of piling up on one node. Alternatives were rejected for specific reasons: setTimeout in application code dies on pod restart, cron-polling WHERE current_end_time < NOW() + X hammers a hot index at 10M rows with sub-second precision needs, Quartz is single-node, and pg_cron does not handle the volume of moving deadlines. Flink's keyed state plus checkpoint-replay is the specific combination that fits.

Flink runs in high-availability mode with Kubernetes-native JobManager leader election (ZooKeeper is the legacy path). State is checkpointed to S3 every 10 seconds. Flink's exactly-once guarantee covers internal state and timer firing, not payment side effects, which are made effectively-once downstream via the fencing token + idempotency key (§9). On JobManager failure, a new leader recovers state from the latest checkpoint and resumes. Missed timers fire on recovery: Flink's event-time semantics replay any timer whose deadline passed during downtime.

Reconciliation via advisory lock. A Postgres advisory lock is a 64-bit named coordination primitive that auto-releases on session end. The settlement consumer fleet uses one only for the housekeeping loop (reconciliation, DLQ replay). Bid processing and settlement itself are partition-owned via Kafka consumer groups and do not need global leadership.

Reconciliation workers poll pg_try_advisory_lock(91823746). The winner runs the janitor loop (find settlements stuck in INITIATED > 60 s, re-drive them). Losers idle. On crash, the session ends, the lock auto-releases, and a standby wins within the poll interval (5 s).

sql

SELECT pg_try_advisory_lock(91823746);
-- Held for the session lifetime; releases on disconnect.

Upgrade: etcd lease. For multi-region coordination, switch to an etcd lease (TTL = 15 s). Redlock's clock assumptions break under GC pauses; ZooKeeper is etcd-equivalent at heavier operational weight.

[11.2] Anti-sniping extension semantics

Anti-sniping extends the auction when a bid arrives within a configurable window of the end time. The extension is atomic with bid acceptance: the Lua script that accepts the bid also updates current_end_time in the same execution.

anti_snipe_seconds = 30: the trigger window.
anti_snipe_extend = 120: the extension.
When bid arrives && (end_time - now) < 30, set end_time = end_time + 120.

Extension is relative to the current end time, not the original. Rapid-fire bids near end can compound extensions until no bid arrives in the final window.

Infinite-extension attack. A bot placing one bid every 29 seconds (inside the 30 s window) keeps the auction open forever. Two caps prevent this:

max_extensions (default 20). Once hit, anti-sniping stops firing. Bids still accepted, but the auction ends at the current current_end_time regardless of how close the bid lands.
absolute_end_time = original_end + 30 minutes. Hard ceiling. The CAS script checks new_end > absolute_end_time and clamps. Beyond 30 min past the original close, no more extensions.

Both live on the auctions row (extension_count, absolute_end_time), incremented atomically by the Lua script alongside current_end_time. 30 minutes is long enough to cover legitimate rapid-fire bidding on hot items; longer than that is almost always a bot.

[11.3] Timer re-arming

When the CAS script extends current_end_time, the bid processor emits auctions.end_time_changed. Flink's keyed-timer state consumes this, cancels the old timer, and arms a new one at the new end time.

Timer precision hinges on the bid → CAS → event → Flink timer update path. End-to-end latency from bid acceptance to timer re-arm is typically 50-100 ms. If Flink fires the old timer before the re-arm lands, settlement runs against a stale end time. The fencing token + conditional UPDATE (§9.4) handles it: settlement writes SETTLING, re-reads current_end_time from Postgres, sees it has moved, and aborts.

[11.4] Clock skew

All timestamps are assigned server-side at API ingress; clients are never trusted. The end-time check in the CAS uses the auction's home-region Valkey clock (§14 pins each auction to one write region). Flink timers fire on event time carried on auctions.end_time_changed events, which propagate the ingress server_ts. NTP drift across the API tier is at most ~10 ms, well inside the 1 s precision target.

[11.5] Testing the concurrency model

The CAS + fencing + idempotency stack is hard to read from code alone. Tests that run before any change ships:

Kafka rebalance fault injection. Kill a bid processor mid-consume while a hot auction is taking 500 bids/sec; assert no double-accept, no dropped bid, no sequence_num gap.
Valkey primary failover chaos. Force a Sentinel-promoted replica while CAS is in flight; assert the client sees either ACCEPTED or REJECTED, never both, and the bid_result cache survives.
Settlement replay against prod snapshot. Copy a week of settled auctions, reset auctions.status and auction_settlements, re-run the pipeline with the payment provider in test mode; assert one SOLD row per auction and one capture per idempotency key.
Redelivery torture. Force Kafka to redeliver every bids.incoming message 3×; assert the bid_result:{bid_id} dedup makes outcomes identical to a single-delivery run.

12. Hot Auctions and Fair Queueing

[12.1] The celebrity auction problem

99% of auctions take <1 bid/sec. A few take 100-500 bids/sec in their final minute. One Kafka partition per auction would waste 400 partitions on idle traffic; one partition for all auctions would create a hot-key problem where one popular auction starves the rest.

Solution: hash partitioning with 400 partitions. partition = hash(auction_id) % 400. On average, each partition serves 25K auctions. A hot auction shares a partition with ~25K quiet ones. The hot auction's bids queue behind themselves (serialized by the partition owner), but other auctions on the same partition are only marginally slowed.

Burst handling. When a single partition sees a 10× spike:

The bid processor for that partition cannot scale horizontally (one consumer per partition).
p99 latency rises from 60 ms to ~500 ms for bids on that partition.
Users see a slow confirmation, not a rejection.

If a hot auction consistently overwhelms its partition, the auction is moved to a dedicated partition. Rare; most celebrity-driven bursts are short-lived.

[12.2] Per-user rate limit

One bidder cannot fire more than 10 bids/sec on one auction. Valkey counter with TTL:

INCR rate:bid:{user_id}:{auction_id}
EXPIRE rate:bid:{user_id}:{auction_id} 1
-- reject if count > 10

This protects the CAS script from a single misbehaving client. Legitimate rapid bidding (proxy cascades) comes from internal services that use a separate quota pool.

[12.3] Noisy-neighbor protection layers

Layer	Mechanism	Where it runs
Per-user API rate limit	Sliding window by `user_id`	API gateway
Per-auction burst control	Kafka partition quotas (`quota.consumer_byte_rate`)	Kafka broker
Bid processor scheduling	Single consumer per partition; fair scheduling within	Bid processor
Pub/Sub fan-out cost	Bounded subscriber count per pod	WebSocket gateway

13. Auction Search and Ranking

Discovery is the bridge between 10M listings and a buyer who wants to find one. The browse experience uses Elasticsearch as the primary read index, populated from Postgres via CDC and from bids.accepted via the Kafka pipe.

[13.1] What buyers actually query

Three queries dominate traffic:

Category browse. "Show me electronics, sorted by ending soon."
Free-text search. "Vintage rolex submariner."
Saved-search alerts. "Notify me when a Honda CB750 listing under $5,000 is posted."

Everything else (advanced filters, geo) is long-tail.

[13.2] Ranking signals

The ranking function combines static and live signals. Static signals come from Postgres CDC; live signals come from bids.accepted.

Signal	Source	Update cadence	Weight
`time_remaining_sec`	Live (computed at query time)	per query	High for "ending soon" sort
`bid_count`	`bids.accepted` stream	seconds	Medium (proxy for interest)
`watcher_count`	Watchlist subscribe events	seconds	Medium
Text relevance (BM25)	Title + description tokens	At index time	High for free-text
Category match	`category_id` exact	At index time	High
Seller reputation	Postgres CDC	hourly	Medium
Promoted boost	`auctions.promoted_until`	hourly	High when active
Image quality score	Image moderation pipeline	At upload	Low (tiebreaker)

[13.3] Ending-soon ranking

The most-clicked surface. Buyers want to see auctions ending in the next 1-24 hours, sorted by remaining time.

Naive approach. ORDER BY current_end_time ASC LIMIT 50 against Elasticsearch. Works but the head of the list is dominated by no-bid junk listings.

Production approach. Composite score: score = (1 / max(time_remaining_minutes, 1)) * (1 + log10(1 + bid_count)). Boosts auctions that are both ending soon and seeing real activity; the max(..., 1) floor avoids divide-by-zero at the instant of close. Cold listings drift to page 5+.

Refresh strategy. The time_remaining field is computed at query time from current_end_time minus now(), so the index does not need re-writes for the clock advancing. Anti-sniping extensions update current_end_time via the CDC pipe; freshness is sub-second.

[13.4] Free-text relevance

Standard Elasticsearch BM25 on title^3, description^1, brand^2. Title boost is highest because auction titles are short and dense.

Synonyms. A small synonym set per category: "rolex" / "rolex watch" / "submariner" / "sub" all match. Maintained by an editorial team; ~5K synonyms total.

Spell correction. phrase_suggester with a 1-edit-distance budget. Corrections are surfaced as "did you mean" without auto-replacing the query.

Personalization (optional). A small per-user vector of category affinities derived from past bid + watch history is mixed into the ranking with a low weight (10%). Not aggressive; users hate when search "guesses."

[13.5] Promoted listings

Sellers can pay to boost a listing in search results. Implementation:

auctions.promoted_until TIMESTAMPTZ field; when present and > now, a fixed score boost is added.
Promoted listings are clearly labeled in the UI (legal requirement in many regions).
The boost saturates: at most 2 promoted slots per page, after which others rank organically. Prevents pay-to-rank-above-quality.

Revenue from promoted listings is tracked separately for billing.

[13.6] Index pipeline

Index lag target: <60 s p99 from auction creation to first searchable. Bid-count freshness target: <10 s p99.

[13.7] Failure modes

What	Effect	Mitigation
Elasticsearch cluster down	Search returns 503; browse falls back to a static "popular" page from CDN	Multi-AZ ES cluster with 3 master nodes; fallback page refreshed hourly
Index lag spike (>5 min)	New listings invisible	Auto-page on indexer lag; pause promoted-listing billing during incident
Bad index update (mapping change)	Field type errors on query	Blue-green index strategy: build new index in parallel, swap alias atomically
Search abuse (scrapers)	Inflated query load, ranking distortion	Per-IP rate limit; bot-detection on User-Agent + click-through ratio

14. Multi-Region

[14.1] Write locality

Each auction is pinned to one region at creation time (auctions.region). All bids for that auction route to that region's infrastructure. This avoids global coordination on the hot path and accepts slightly higher latency for cross-region bidders (~100 ms extra).

Routing: the L7 load balancer reads auction_id from the request path, resolves it to a region via a small lookup service backed by a cached auction_id → region table, and proxies the request accordingly. Auction creation (POST /api/v1/auctions, no auction_id in the path) routes to the user's home region as declared in their JWT; the created auction inherits that region and the auction_id → region cache is populated on create.

[14.2] Cross-region reads

Browse, search, and auction-detail reads are served from the nearest region. Postgres read replicas replicate across regions with typical lag of ~50 ms intra-region and 150-300 ms cross-region. Elasticsearch indexes are built per-region from the Kafka bids.accepted stream via MirrorMaker 2.

A reader in EU sees the auction detail from the EU Postgres replica. When they click "bid," the request routes to the auction's home region (say US-East) for processing.

[14.3] Region failover

If the auction's home region goes fully dark:

Read side: other regions continue serving browse and detail traffic from their replicas.
Write side: bids for that region's auctions reject with 503. There is no automated failover of write ownership; it would require durable cross-region state replication that would slow the hot path. Manual failover is a 5-10 min RTO: promote the replica to primary, update the auction_id → region table, resume traffic.

A fast hot path wins over automated cross-region write failover, which would require durable cross-region consensus on auction state. Settlement runs in the auction's home region; the advisory-lock reconciliation job (§11) runs globally only in the primary DR region.

15. Bottlenecks and Backpressure

[15.1] Hot auction CAS contention

One auction pulling 500 bids/sec hits the CAS script 500 times/sec. Each script is 0.5 ms, so the key spends 250 ms/sec executing scripts. One Valkey shard is single-threaded, so the hard ceiling for this script is ~2K/sec per key (0.5 ms × 2K = 1 CPU-second per wall second).

Mitigation beyond 2K/sec: Valkey Cluster with consistent hashing (the key for a given auction lands on one specific shard; distribute hot auctions across shards), or shard state within one auction (rarely necessary).

Saturation fallback. If the per-key CAS queue depth crosses a threshold (p99 script wait > 50 ms), the bid processor trips a per-auction circuit breaker. New bids for that auction queue with a SLOW_PATH hint, and the client UI shows a "high traffic, bids may be delayed" banner instead of failing. The fallback drains as the queue clears. Ordering is preserved under stress and perceived latency stays honest.

[15.2] WebSocket fan-out amplification

A hot auction with 5K watchers produces 5K frames for every accepted bid. At 500 bids/sec, that is 2.5M frames/sec for one auction. Across 40 pods, ~62.5K frames/sec per pod from this auction alone.

Mitigation:

Batching: combine multiple bid updates within 100 ms into one WebSocket frame for quiet connections.
Coalescing: if two bids arrive within 50 ms, the second overwrites the first in the outbound buffer (the client only needs the latest state).
Shedding: slow connections (client ack lag > 5 s) are disconnected. The client will reconnect and resume from Postgres.

[15.3] Kafka consumer lag

400 partitions; a processor crash leaves its partition unassigned for ~2 s (rebalance time) + consumer group re-join (~1 s). Bids pile up in the partition during this window. For a quiet partition, lag is recovered in seconds. For a hot partition, ~1000 bids queue.

KEDA scales the processor fleet on lag metric, but lag-driven autoscaling is reactive. For predictable peaks (evening prime time), schedule a pre-warm scale-up via a cron-triggered deployment annotation.

[15.4] Partition-bound parallelism

400 Kafka partitions = 400 concurrent consumers, period. Adding a 401st pod does nothing. Growing past 50K bids/sec requires adding partitions, which rebalances the consumer group and migrates state.

Plan for 2× growth: provision 800 partitions up front. The extra ones cost little in Kafka, and the option to scale is worth the future pain avoided.

[15.5] Postgres bid insert rate

50K inserts/sec into a single bids table is beyond one Postgres primary (typical ceiling: 15-25K/sec for narrow rows on NVMe). Mitigation:

Partitioning by created_at spreads current week's inserts across one weekly partition; older partitions see no writes.
Batched inserts. COPY or multi-row INSERT from the bid processor, 10-50 bids per Postgres round trip.
Async write. The processor can return CAS acceptance to the client before the Postgres insert completes. Postgres insert is re-driven on failure from Kafka bids.accepted.

Alternative: shard bids by auction_id across multiple Postgres primaries. Operational complexity; avoided unless scale demands it.

[15.6] Backpressure: three-layer admission control

Cause (§15.1-15.5) and response belong together. When a bottleneck flares, admission control sheds load before it becomes user-visible.

Kafka consumer lag. If bids.incoming lag exceeds 30 s for 60 s, the API gateway returns 503 for new bids on that region. High-value auctions (top 1%) still accepted; others shed.
Bid processor loop time. If CAS + persist p99 exceeds 100 ms for 2 min, the API reduces per-user rate limits to 5 bids/sec (from 10).
Postgres write latency. If bids insert p95 exceeds 20 ms, the processor drops to async-write mode and batches inserts at 50/batch.

Each response is reversible on recovery. None require a deploy.

16. Retries, Fraud, and Recovery

[16.1] Client retry semantics

Bid clients retry on network errors with exponential backoff and jitter. The Idempotency-Key header keys the retry: the bid processor's UNIQUE (bidder_id, idempotency_key) constraint makes the second attempt a no-op, returning the stored result.

[16.2] Fraud detection

Auction fraud is a category problem. Real platforms see all of these in production. Defenses run on two timescales: inline (millisecond-level checks at the API) and offline (ClickHouse jobs over 30-day bid histories, daily).

Pattern: shill bidding (seller pumps own auction).

Seller creates a sock-puppet account and bids on their own listing to drive the price up.

Inline signals. Same device fingerprint, same browser font hash, same IP /24, same payment-method fingerprint between bidder and seller account.
Offline signals. Account graph: bidder who only bids on one seller's auctions, never wins, never pays. Payment-method reuse across "different" accounts.
Action. Inline: flag and route to bids.quarantine. Offline: account suspension + refund affected winners + ban payment method.

Pattern: bot bidding (high-frequency automated bids).

Bots that bid in the final 100 ms of an auction, often repeatedly across thousands of listings.

Inline signals. Bid timing variance below human reaction floor (<150 ms between successive bids), inhuman click coordinates from the same client, missing or replayed CSRF tokens.
Offline signals. Per-account bid-rate distribution clustering far above the cohort median.
Action. Inline: rate-limit hard, then CAPTCHA challenge. Offline: account ban if pattern persists post-challenge.

Pattern: bid retraction abuse.

Bidder bids high to scare off competition, then retracts at the last moment to win at the previous lower price.

Inline signals. Retraction within X minutes of placing a bid that became the high bid.
Offline signals. Retraction frequency per bidder (>5% of bids retracted), retractions clustered near auction end.
Action. Hard cap on retractions per bidder per month; flag-and-review if hit.

Pattern: collusion / bid cartels.

Coordinated group rotating winners across high-value categories to keep prices artificially low.

Offline signals. Graph clustering on co-bidder pairs (accounts that consistently appear together but never outbid each other in the final stretch). Network analysis on shared shipping addresses, payment instruments, login geolocation.
Action. Manual investigation; bans propagate across all linked accounts in one go.

Pattern: account takeover / fraudulent winning bids.

Compromised account places massive bid, abandons payment, leaves seller with no real winner.

Inline signals. Login from new geolocation + first bid above N× the account's historical average + new payment method added in last 24 h.
Action. Step-up authentication (SMS / email confirm) before the bid commits to Kafka.

Detection pipeline.

Operational reality. Inline rules cover ~80% of crude attacks at near-zero latency. Offline graph analysis catches the sophisticated 20% with 24 h lag. False-positive budget is the SLO: target <0.1% of legitimate bids flagged; human review capacity is the binding constraint. High-value items (>$10K) get tighter inline thresholds plus KYC before the first bid. Risk tiers gate inline action (§9.11).

Shill bidding by determined sellers using clean burner phones, residential proxies, and prepaid cards is hard to catch inline. The defense is offline payment-graph correlation over months plus selective audit of winners who never review the seller.

[16.3] Seller-side fraud and delivery disputes

Buyer-side fraud (§16.2) is only half the surface. Sellers can ship nothing, ship counterfeits, or list stolen goods. Defenses run at three points in the auction lifecycle.

At listing creation.

Perceptual-hash check against the platform's stolen-goods denylist (§8.6).
Seller account risk tier: new sellers cannot list above a category cap until they complete a first successful sale plus KYC.
High-value categories (watches, electronics > $5K) require photo-with-serial or authenticator-partner verification before the listing goes live.

At settlement, before payout. Funds sit in a platform-held escrow for the chargeback window (7 days default, 30 days for new sellers; see §9.13). Payout releases only after the window closes and no dispute is open.

On buyer complaint (no-ship, not-as-described, counterfeit). A dispute opens a buyer_claims row in INVESTIGATING. The settlement state moves to DISPUTED and payout is frozen. Resolution paths:

Seller uploads tracking proving delivery: claim closes, payout releases.
Buyer returns the item with return-tracking: refund fires against the original payment_intent_id with Idempotency-Key: refund-{auction_id}.
Counterfeit confirmed by authenticator partner: seller account banned, funds clawed back, refund fires, listing pHash added to the stolen-goods denylist.

The claim window is 30 days from settlement by default, extended to 90 days for items > $10K. All transitions write to audit_log with actor and reason.

[16.4] Payment failures

Winner's payment method declines. Flow:

Settlement captures the auction but payment returns card_declined.
auction_settlements.status = 'PAYMENT_FAILED'.
Notification to the winner: "Your payment failed, please update your payment method within 48 h."
48 h timer fires; if still failing, the settlement is marked ABANDONED, the auction is offered to the second-highest bidder (if reserve is still met).
Second-chance offer via email + dashboard. If accepted, settlement restarts with a new fencing token.

17. Failure Scenarios

[17.1] Bid processor crashes mid-bid

Processor consumed a message, ran the CAS, but crashed before persisting to Postgres.

Effect. Valkey has the accepted bid reflected in current_price, and bid_result:{bid_id} holds the accept outcome with sequence_num = 24. Postgres does not have the bid row. Kafka offset was not committed.

Recovery. Kafka redelivers on partition rebalance. The new processor runs the CAS script again. The script's first action is SET bid_result:{bid_id} … NX; since the key already exists, the script returns the cached outcome (ACCEPTED, seq=24) without re-mutating state. The processor then writes the bid row to Postgres using the cached sequence_num, publishes bids.accepted, and commits the Kafka offset.

Why the bid_result cache is the correct mechanism. Without it, the second attempt sees current_price already at $105, the expected_price = $100 check fails, and the CAS returns STALE_EXPECTED_PRICE: a rejection for a bid that was actually accepted. The bid_id NX cache makes the CAS idempotent per bid, not per current-state view.

TTL. bid_result:{bid_id} lives for max(auction_end − now, 0) + 48 h. The 48 h buffer covers settlement retries and any reconciliation pass. Memory cost at 50K submissions/sec is bounded: by the time a 7-day auction's settlement completes, its bid_result keys are within the window of expiry.

Postgres-write ordering. The processor writes to Postgres after the CAS but before committing the Kafka offset. A second crash between Postgres write and offset commit is harmless. The redelivered message hits the cached result, and the UNIQUE (auction_id, sequence_num) partial index (§7.2) makes the second INSERT a no-op.

[17.2] Valkey node failure during CAS

Valkey primary dies during a Lua script execution.

Effect. The script may or may not have committed its changes to memory. AOF fsync = everysec means up to 1 s of writes can be lost on hard failure.

Recovery. Valkey Sentinel or Cluster promotes a replica. The replica has the state as of the last replication lag (~1-10 ms for local cluster).

Bid correctness. The bid processor sees a connection error and retries. Same bid, same expected_price. If the CAS's effect was replicated, retry fails with STALE_EXPECTED_PRICE. If not, retry succeeds. Either way, no double-accept.

Hot-start hydrate. If Valkey loses state entirely (data center loss), re-hydrate from Postgres: SELECT auction_id, MAX(amount), COUNT(*), MAX(sequence_num) FROM bids WHERE status = 'ACCEPTED' GROUP BY auction_id. Takes minutes for 10M auctions; during that window, new bids are rejected with 503.

[17.3] Flink checkpoint failure during settlement

Flink checkpoint fails while a settlement job is mid-flight.

Effect. On recovery, Flink replays events from the last successful checkpoint. The settlement event for this auction may fire twice.

Recovery. Fencing token guards (§9.4). The second attempt gets a higher token, writes SOLD + fencing_token = 43 (was 42). Payment call uses Idempotency-Key: settle-{auction_id}. Same key as attempt 42, so Stripe returns the original response. No double charge.

[17.4] WebSocket gateway pod crashes

Pod holding 50K connections dies.

Effect. 50K clients see disconnect. They reconnect to another pod (sticky-session LB routes to healthy pods).

Recovery. On reconnect, each client sends resume with its last seen sequence. The new pod fetches missing bids from Postgres. Service restored in seconds per client.

[17.5] Postgres primary failover during peak bidding

Postgres primary dies. Streaming replica promotes via Patroni; failover takes 20-60 s depending on health-check cadence and connection drain.

Effect. Bid processors cannot persist during those 30 s. Valkey CAS continues (Valkey is independent). Accepted bids pile up in a pending_persist queue in the bid processor.

Recovery. On Postgres return, processors drain the pending queue. Bid acknowledgements to clients are already sent (based on CAS result); clients see no degradation. WebSocket broadcasts also continue (Valkey Pub/Sub is independent).

Constraint. The pending_persist queue is bounded (~30 s × 50K = 1.5M messages). Each processor's queue is local memory. Hard cap: 100K per processor, after which the processor starts rejecting bids. With 400 processors, cap is 40M, plenty.

[17.6] Kafka broker outage

One broker out of six dies. Replication factor 3 means each partition has two survivors. Kafka rebalances leadership within seconds.

Effect. Sub-second p99 latency spike on produce for partitions whose leader was on the dead broker. No message loss.

Recovery. Automatic. Dead broker replaced and caught up via replica fetch.

[17.7] Region outage

Auction's home region goes dark (power, network, provider outage).

Effect. Bids for auctions pinned to that region fail with 503. Other regions unaffected for their auctions.

Recovery. Manual failover of auctions to a designated DR region: promote the Postgres replica, point the auction_id → region lookup to the new region, resume traffic. RTO 5-10 min. RPO is the Postgres replication lag at the moment of failure (typically <500 ms).

Settlements for in-flight auctions in the failed region are deferred until region recovery. If recovery exceeds the SLA, the DR region takes over settlement using the last-known auction_settlements state and fencing tokens.

Auction dying inside the final window. If the home region loses quorum in the final 30 seconds of an auction, the timer service in the DR region does not auto-fire the close. On manual failover (5-10 min RTO), the promoted Postgres replica carries the last durable current_end_time, and any end_time_changed events in Kafka replay on the new Flink job. If replication lag at the moment of failure was below the anti-snipe extension window (typical 120 s), no bids are lost and the auction closes on the extended end time. Winner and runner-up are determined from the highest accepted sequence_num in the replicated bids table. Settlement then runs normally on the DR region.

[17.8] Payment provider outage

Payment provider is down or degraded for 30-120 minutes. Rare but not theoretical: every major PSP has shipped multi-hour incidents at some point.

Effect. Settlements that reach the "capture" step fail with 5xx or timeout. Settlement state stays in INITIATED or PAYMENT_AUTHORIZED. Bidding is unaffected; only the last-mile charge is blocked.

Inline response.

Circuit breaker on the payment client: open after 20% failure rate over 60 s, half-open retries every 30 s. Prevents piling up retries against a dead provider.
Queue depth check: if auction_settlements in INITIATED grows past 1000, page ops. Capture is deferred but safe; winners wait longer for the "won" confirmation.
Email winners: "Your payment is being processed. If it fails, we will retry automatically for 48 hours." (literal user-facing copy)

Recovery. When the provider recovers, the reconciliation job (janitor loop, §11.1) drains the INITIATED queue in FIFO order. Each retry uses the same Idempotency-Key: settle-{auction_id}, so Stripe returns the original successful capture for any that slipped through before the outage.

Secondary provider. For a provider-wide outage (rare but catastrophic), a secondary PSP (Adyen as Stripe's backup) is wired as a configurable fallback. Switching providers changes the idempotency key namespace; any in-flight INITIATED settlements stay on the primary until it recovers. New settlements start on the secondary.

18. Operational Playbook

[18.1] Deployment

Bid processor: rolling deploy, 10% of pods at a time. Partition reassignment on each pod rotation triggers a ~1 s rebalance. 40-pod fleet redeploys in 8 min with zero downtime.
WebSocket gateway: rolling with 10 s connection drain. Clients reconnect automatically.
Flink: savepoint-and-restart. 60-90 s window where timers don't fire; recovered timers fire on startup.
Postgres schema changes: ALTER on live tables uses pg_repack or online-DDL techniques. Adding columns is free; changing types requires a shadow table migration.

[18.2] Metrics and alerts

Key metrics:

Metric	Alert threshold	Reason
Bid acceptance p99 latency	>200 ms for 5 min	User-visible slowness
Kafka `bids.incoming` lag	>30 s for 60 s	Bid backlog growing
Valkey CAS p99	>2 ms for 2 min	Hot-key saturation
Settlement p99	>5 s for 5 min	Revenue-critical path
Payment success rate	<99% for 5 min	Payment provider issue
WebSocket frame drop rate	>1% for 2 min	Gateway overload
Postgres write p95	>20 ms for 2 min	Approaching bottleneck

[18.3] Backup and recovery

Postgres continuous archiving to S3 with 5-min PITR granularity. Daily full backup.
Valkey AOF with everysec fsync. Snapshot to S3 every hour.
Kafka replication factor 3; no separate backup (the topic retention is the backup).
ClickHouse daily backup to S3; analytics replayable from Kafka retention.

[18.4] Capacity planning

Monitor ratio of peak to average bid rate weekly. If the 30× ratio grows, partitions need scaling.
Monitor hot-auction distribution. If the top 0.1% regularly exceeds 500 bids/sec, consider dedicated partition assignment.
Monitor Postgres bid table growth. Re-evaluate archive cadence quarterly.

[18.5] Top 5 alerts (3 AM on-call)

Bid acceptance p99 >500 ms. Likely Valkey or Postgres degradation.
Settlement latency >30 s. Payment provider or Flink issue.
Payment success rate <95%. Upstream provider outage or fraud spike.
WebSocket reconnection rate >10× baseline. Gateway pod crashes or LB misrouting.
Kafka lag >2 min. Processor fleet under-provisioned or stuck.

[18.6] Observability stack

Metrics: Prometheus scrapes all services; long-term retention in Mimir or Thanos. Grafana for dashboards. Red-golden-signal boards per subsystem (API, bid processor, Valkey, Postgres, Kafka, Flink, WebSocket gateway, settlement).
Tracing: OpenTelemetry SDK in each service; traces exported to Tempo or Jaeger. Trace header threads from API gateway through Kafka (producer-injected headers) into bid processor, broadcast, and settlement. The bid_id is tagged on every span so a single bid is end-to-end queryable.
Logging: structured JSON to Loki; ten-minute hot retention, 30-day cold. Alert on error-rate anomalies, not absolute error counts.
Profiling: continuous pprof for Go services (bid processor, WebSocket gateway) into Pyroscope. CPU flamegraphs are the fastest way to diagnose hot-key Valkey script regressions.
Synthetic probes: a black-box tester places bids against a canary auction every 30 s from each region; SLO breach fires before real users notice.

[18.7] Lua script change management

The CAS script is a hot-path correctness change; treat it like a schema migration:

EVALSHA versioning. Processors load script_v<N>.sha from config at startup. A deploy that flips the config to script_v<N+1>.sha is an atomic version swap.
Canary auction. New scripts are shadow-run against a synthetic auction in staging, then enabled for 1% of live auctions (steered via a Valkey config flag) before global rollout.
Instant rollback. Rollback is a config flip back to the previous SHA; old scripts are never deleted from Valkey until two deploy cycles have passed.

19. SLOs and Error Budgets

SLO	Target	Error budget
Bid acceptance availability	99.99%	52 min/year
Bid confirmation latency p99	<200 ms	7.2 h/month outside bound
Bid broadcast latency p99	<200 ms	7.2 h/month outside bound
Settlement correctness (zero double-settlements)	100%	0 incidents/year
Settlement latency p99	<5 s	43 h/month
WebSocket connection success	99.9%	43 min/month
Search freshness (new auction visible in search)	<60 s	10% of auctions/day

Error budgets drive release cadence. If bid acceptance availability dips below 99.99% monthly, feature deploys halt and engineering focuses on stability until the budget recovers.

20. Security

Authentication: OAuth 2.0 for the API; session cookies for the web client. JWTs carry user_id, account_status, region.
Authorization: bidders cannot bid on their own auctions (enforced at the API). Sellers cannot modify auctions after first bid (enforced by status check).
Payment data: never touches platform infrastructure. Payment provider handles PAN; the platform stores only a tokenized reference.
Webhook signatures: all incoming provider webhooks (Stripe, Adyen) verified via HMAC before processing.
Rate limits: per-user, per-IP, per-auction. Rate limit events logged for fraud analysis.
PII: bid history visible to the bidder, seller, and platform ops. Watcher lists are private. High-bidder usernames are masked in public views, so competitors cannot directly identify each other from the live feed.
Reserve price: stored plaintext in auctions.reserve_price under Postgres row-level security. The value never appears in API responses, WebSocket frames, Kafka payloads (bids.accepted, auctions.sold), or analytics exports. Buyers see a boolean reserve_met only, and only after settlement. Seller, settlement worker, and platform ops are the only readers.
Admin actions audited: all admin overrides (bid cancellation, auction force-close, user ban) logged to audit_log with actor, timestamp, reason.
Edge protection: WAF in front of the API with rules for SQL injection, path traversal, and known bad bot UAs. Anycast scrubbing (Cloudflare, Shield Advanced, or equivalent) absorbs volumetric DDoS.
Proof-of-work challenges: the bid endpoint optionally requires a lightweight PoW token on suspicious sessions (new account, high bid amount, residential-proxy IP block). Cost is ~100 ms client-side, invisible to real users, expensive for scrapers at scale.
CAPTCHA: adaptive challenge on account creation, password reset, and bid submission when the risk score crosses a threshold. Managed service (Turnstile, hCaptcha) rather than hand-rolled.

21. Key Takeaways

Optimistic concurrency via Valkey CAS scales bid acceptance to 50K/sec with sub-ms latency. Pessimistic row locks cannot.
Effectively-once settlement stacks three layers: fencing token, conditional UPDATE, and provider idempotency key. Any one alone is insufficient.
Kafka partition per-auction_id gives free per-auction serialization. Partition count is the hard parallelism ceiling; provision 2× growth up front.
Anti-sniping belongs inside the CAS, not a separate pipeline. Atomic extension is the only way to close the "accept bid" vs "extend end_time" race.
The Postgres-only variant (§5.2) is the right starting point up to ~500 bids/sec. Graduate to Kafka + Valkey + Flink only when scale demands it.

22. Appendix

A. Atomic CAS + anti-sniping Lua (with bid_id dedup)

lua

-- KEYS[1] = auction:{id}
-- KEYS[2] = bid_result:{bid_id}
-- ARGV:
--   1 bid_amount
--   2 expected_price
--   3 bidder_id
--   4 min_increment
--   5 now (unix seconds)
--   6 anti_snipe_seconds
--   7 anti_snipe_extend
--   8 result_ttl_seconds   (auction_end - now + 48h)

-- 1. Redelivery dedup. If bid_id has a cached outcome, return it verbatim.
local cached = redis.call('GET', KEYS[2])
if cached then
  return cjson.decode(cached)
end

local price  = tonumber(redis.call('HGET', KEYS[1], 'current_price'))
local endt   = tonumber(redis.call('HGET', KEYS[1], 'current_end_time'))
local status = redis.call('HGET', KEYS[1], 'status')

local function finish(result)
  redis.call('SET', KEYS[2], cjson.encode(result), 'EX', tonumber(ARGV[8]))
  return result
end

-- 2. Auction state checks (order matters; stale before too-low for UX).
if status ~= 'ACTIVE' or tonumber(ARGV[5]) > endt then
  return finish({0, 'AUCTION_CLOSED', price, endt})
end
if tonumber(ARGV[2]) ~= price then
  return finish({0, 'STALE_EXPECTED_PRICE', price, endt})
end
if tonumber(ARGV[1]) < price + tonumber(ARGV[4]) then
  return finish({0, 'BID_TOO_LOW', price, endt})
end

-- 3. Acceptance path.
local seq = redis.call('HINCRBY', KEYS[1], 'sequence_num', 1)
redis.call('HSET', KEYS[1],
  'current_price', ARGV[1],
  'high_bidder',   ARGV[3])

local time_left = endt - tonumber(ARGV[5])
if time_left < tonumber(ARGV[6]) then
  local new_end = endt + tonumber(ARGV[7])
  redis.call('HSET', KEYS[1], 'current_end_time', new_end)
  return finish({1, seq, ARGV[1], new_end, 'EXTENDED'})
end

return finish({1, seq, ARGV[1], endt, 'OK'})

The bid_result:{bid_id} cache makes the script idempotent per bid. A Kafka redelivery after the processor crashed post-CAS returns the original outcome (same sequence_num) without re-mutating state (see §17.1).

B. Fencing token flow sequence

C. Bid sequence invariants

For a given auction:

sequence_num is monotonically increasing on accepted bids. Assigned atomically by the Valkey CAS script.
Gap-free within accepted bids (partial unique index on (auction_id, sequence_num) WHERE status = 'ACCEPTED'). Rejected bids carry sequence_num = NULL and do not consume a number.
Highest amount among status = 'ACCEPTED' is the current winner. Ties broken by lowest sequence_num (earliest arrival).
auctions.current_price is maintained by the bid processor's Postgres write; it is the displayed price. The authoritative ordering, however, lives in bids: highest accepted amount with lowest sequence_num tiebreak. For any read where exactness matters (e.g. settlement), derive from bids directly.

Explore the Technologies

Technology	Role	Learn more
Postgres 17	Source of truth for auctions, bids, settlements	PostgreSQL
Valkey 8	Per-auction state, CAS target, Pub/Sub bus	Redis/Valkey
Kafka 4.0	Bid delivery bus, per-auction ordering	Kafka
Apache Pulsar	Alternative dispatch bus (§5.3)	Pulsar
Flink 1.19	Auction-end timers, settlement pipeline	Flink
etcd 3.6	Leader lease (upgrade from advisory lock)	etcd
ClickHouse 24	Analytics over bid history	ClickHouse
Elasticsearch 8	Auction browse and search	Elasticsearch

Patterns: Message Queues & Event Streaming, Circuit Breakers & Resilience, Auto-scaling, Replication & Consistency, Rate Limiting.

Practice this design: Online Auction interview question.

CrackingWalnuts

System Design: Job Scheduler (10M Jobs/day, DAG Dependencies, Effectively-Once Execution)

April 15, 2026 · 61 min read

System Design: LeetCode (Code Sandbox, Container Isolation, Real-Time Contests)

April 12, 2026 · 68 min read

System Design: URL Shortener (10B Short URLs, 100K Redirects/sec)

April 11, 2026 · 42 min read

Continue Learning

Explore 30+ topics in System Design Interview Prep→

Deep dives, diagrams, and interview-ready knowledge.

CrackingWalnuts

System DesignApril 17, 2026· 83 min read

System Design: Online Auction (50K Bids/sec, Effectively-Once Settlement, Anti-Sniping)

Note

Goal

A real-time online auction platform.

Scale:

10M active listings
50K bids/sec at peak, 1.7K bids/sec average
1M concurrent WebSocket watchers
Sub-200ms regional p99 bid confirmation and broadcast, 99.99% availability (cross-region readers see +100 ms)

Features:

English, Dutch, and sealed-bid auction types
Proxy (auto) bidding
Anti-sniping extension
Effectively-once settlement converging on a single committed winner

Note

TL;DR

The API validates cheaply and writes bids to Kafka partitioned by auction_id, returning 202.
A per-partition bid processor runs an atomic Valkey Lua CAS that accepts or rejects the bid, dedups by bid_id, and assigns a per-auction sequence number.
Accepted bids fan out over Valkey sharded Pub/Sub to WebSocket gateways in under 200 ms.
Flink keyed-timers fire at auction end into a settlement consumer that uses a fencing token plus a stable payment idempotency key to make settlement effectively-once.
Postgres is the source of truth; Valkey is the hot-path coordinator; Kafka is the delivery bus.

Tip

Pick a path

Time	Read	Covers
~10 min	TL;DR + §4	End-to-end flow, who enforces what, where the race conditions live
~30 min	TL;DR, §4, §5, §9, §10	Stack tradeoffs, effectively-once settlement, bid processing model
~60 min	Full post	Every decision plus anti-sniping, proxy resolution, multi-region, ops

Architecture at a glance

Four flows share one state plane. The write side serializes per auction in Valkey. The read side fans out over Pub/Sub. The timer side settles at auction end.

Correctness lives in two places only: the Valkey CAS (who wins a bid) and the fencing token on the auction row (whose settlement commits). Everything else is delivery and fan-out.

1. Problem Statement

An auction platform sounds simple until the last ten seconds of a popular listing.

Four problems drive the design.

Scale targets.

10M active listings at any time
50K bids/sec at peak, 1.7K bids/sec average (30× ratio driven by evening prime-time)
1M concurrent WebSocket watchers
Average auction duration 7 days; minimum 1 hour; maximum 30 days
Bid confirmation and broadcast latency: <200 ms p99
99.99% availability for bid processing

2. Functional Requirements

ID	Requirement	Priority
FR-01	Create auction listings: title, description, images, starting price, reserve price, bid increment, start and end times, auction type	P0
FR-02	Place bids on active auctions with real-time validation against current highest	P0
FR-03	Real-time bid updates pushed to watchers over WebSocket within 200 ms	P0
FR-04	Anti-sniping: extend auction end time by a configurable amount when a bid arrives within the final window	P0
FR-05	Effectively-once settlement: winner determination, reserve check, payment capture	P0
FR-06	English auction: ascending bids, highest wins	P0
FR-07	Dutch auction: price drops on a schedule, first to accept wins	P1
FR-08	Sealed-bid auction: blind bids, revealed at close, highest wins	P1
FR-09	Proxy bidding: user sets a max, system auto-bids the minimum increment on their behalf	P1
FR-10	Watchlist: users subscribe to auctions and receive notifications on key events	P1
FR-11	Bid history: full audit trail of bids per auction	P0
FR-12	Reserve price: sale only completes if final bid meets the seller's hidden minimum	P0
FR-13	Search and browse by category, price range, ending soon, newly listed	P1
FR-14	Bid retraction within policy window	P2

3. Non-Functional Requirements

ID	Requirement	Target
NFR-01	Bid processing throughput	50K bids/sec peak, 1.7K average
NFR-02	Bid confirmation latency (regional p50 / p99)	60 ms / 200 ms (cross-region readers see +100 ms)
NFR-03	Bid broadcast latency (regional p99, acceptance to watcher frame)	<200 ms
NFR-04	Active concurrent auctions	10M
NFR-05	Concurrent WebSocket connections	1M
NFR-06	Bid processing availability	99.99% (52 min/year)
NFR-07	Settlement guarantee	Effectively-once (one SOLD row, one captured charge)
NFR-08	Bid data durability	Zero loss once the API returns 202
NFR-09	Anti-sniping timer precision	<1 s drift
NFR-10	Recovery Time Objective	<30 s for bid processor partition rebalance
NFR-11	Recovery Point Objective	0 for accepted bids
NFR-12	Retention	Bids: hot 90 days in Postgres, archive to S3, drop after 2 years
NFR-13	Geography	Multi-region active reads; bid writes pinned per-auction to a single region
NFR-14	Search latency	<500 ms p99

[3.1] Traffic and workload assumptions

Median bids per auction ~15; mean ~105. The distribution is long-tailed: most listings end quiet, a small fraction of hot listings pull the mean up sharply. Downstream math (§6.1) uses the mean.
3% of auctions end in any given hour during evening prime time.
Hot auctions (top 0.01%) can take 100-500 bids/sec in the final minute.
Payment provider (Stripe-equivalent) supports idempotency keys and 2xx/4xx responses within 1 s p99.
Watchers per auction: average 30, hot auction up to 5K.
Clients resolve a regional endpoint via DNS; the chosen region processes the bid (auction is pinned to its region).

4. End-to-End Architecture

Shape: a per-key-serialized transactional write path with an event-driven fan-out sidecar for reads. Four flows:

Submit (bid write path)
Process (per-auction serial consumer)
Broadcast (WebSocket fan-out)
Settle (auction end to payment)

Each part does one thing. Correctness lives in two places only: the Valkey CAS (who wins a bid) and the fencing token on the auction row (whose settlement commits).

Each flow gets its own diagram under the subsection that describes it. Start with the write path.

[4.1] Submit (write path)

Client → API → Kafka → Bid Processor → Valkey + Postgres → Kafka bids.accepted

When a bid request arrives, the API does a small set of cheap checks and gets out of the way:

Auth, per-user rate limits (10/sec, 200/min), and risk-tier check with payment hold sized to item value (§9.11).
Load the auction summary from Valkey: HMGET auction:{id} status current_end_time auction_type. If missing, fall back to Postgres.
Reject early if status != ACTIVE or now > current_end_time. These are fast rejects that do not enter Kafka.
Produce to Kafka topic bids.incoming, partition key = auction_id. Message body: {bid_id, auction_id, bidder_id, amount, expected_price, idempotency_key, client_ts, server_ts}.
Return 202 Accepted with {bid_id, status: "QUEUED"}.

Kafka is not the source of truth. Bid acceptance is decided by Valkey; durability lives in Postgres.

[4.2] Process (bid processor fleet)

For each Kafka message:

Read the message. Do not commit the offset yet.
Run an atomic Lua script against Valkey with two keys: auction:{id} (state hash) and bid_result:{bid_id} (dedup cache). Full script in Appendix A. In outline:
- SET bid_result:{bid_id} <placeholder> NX EX <ttl>. If the key already exists, return the cached result. That is the redelivery dedup.
- Check auction status and end time; reject AUCTION_CLOSED if closed.
- Check expected_price matches current; reject STALE_EXPECTED_PRICE if not.
- Check bid ≥ current + min_increment; reject BID_TOO_LOW if not.
- On acceptance: HINCRBY sequence_num, update current_price + high_bidder, and if inside the anti-snipe window extend current_end_time.
- Cache the final result into bid_result:{bid_id} before returning.
The script is atomic under Valkey's single-threaded execution. No two scripts race on the same key.
Accepted path. a. Write the bid row to Postgres with status = 'ACCEPTED' and the assigned sequence_num. The partial unique index on accepted bids (§7.2) keeps sequence_num gap-free. b. Publish to bids.accepted on Kafka. Downstream consumers are the broadcast gateway, proxy-bid resolver, search indexer, analytics pipe, and notification service. c. If the script returned EXTENDED, also publish to auctions.end_time_changed so the Flink timer service re-arms.
Rejected path. Write the bid row with status = 'REJECTED', sequence_num = NULL, and the rejection reason. Emit a bid_result event over the client's WebSocket carrying {bid_id, status: "REJECTED", reason, current_price, end_time}. The HTTP POST already returned 202 at ingress (§8.1); the terminal outcome always rides the WebSocket.
Commit the Kafka offset. Only after the Postgres write succeeds.

[4.3] Broadcast (WebSocket fan-out)

Accepted bids have to reach watchers in under 200 ms. Pipeline:

Bid processor publishes bids.accepted to Kafka.
A small fan-out service consumes bids.accepted and does a PUBLISH auction:{id}:updates <payload> on Valkey Pub/Sub. Payload is the bid summary: {sequence_num, current_price, high_bidder_masked, end_time, time_remaining}.
WebSocket gateway pods subscribe to auction:{id}:updates only for auctions their connected users are watching. Each pod keeps a SUBSCRIBE per active auction in its connection pool.
On PUBLISH, each subscribed pod pushes a frame to every local connection watching that auction.

[4.4] Settle (auction end)

Settlement is where duplicates hurt: a double settlement charges the winner twice or picks two winners. The full guarantee chain is covered in §9.

Atomically increment the fencing token in Valkey: token = INCR fence:auction:{id}.
Read the winning bid from Postgres: the highest-amount ACCEPTED bid with the lowest sequence_num as tiebreaker.
Validate reserve price. If not met, mark the auction UNSOLD and stop.

Conditional Postgres write, guarded by the fencing token:

sql

UPDATE auctions
SET status = 'SOLD', winner_id = $winner, final_price = $price,
    settlement_fence = $token
WHERE id = $auction_id
  AND (settlement_fence IS NULL OR settlement_fence < $token)
  AND status = 'CLOSED';

If zero rows update, a later attempt has already won. Stop.

Call the payment provider with Idempotency-Key: settle-{auction_id}. The key is deliberately tied to the auction, not the attempt: a stable key is what lets Stripe / Adyen return the original response on retry. See §9.2 for why including the fencing token in the key breaks the guarantee.
Update settlement_status = 'PAYMENT_CAPTURED'. Emit auctions.sold.

A crashed settlement re-fires. The fencing token blocks stale writes. The idempotency key blocks duplicate charges. Both together give effectively-once settlement.

[4.5] Trace a bid

To anchor the abstract flow, here is one real bid in wall-clock time. A rare watch auction, current price $12,400. User jan clicks "Bid $12,450" at t=0.

Time	Layer	Event
0 ms	Browser	Client sends `POST /auctions/a1b2/bids` with `{amount: 12450, expected_price: 12400, Idempotency-Key: bid-...}`.
4 ms	API gateway	Auth cache hit, rate-limit OK, risk-tier A, preauth hold fires asynchronously (bid is $12,450, above the $1K threshold). Gateway does not block on it for Tier A.
5 ms	API gateway	Produces to `bids.incoming`, partition 137 (`hash("a1b2") % 400 = 137`). Returns 202 `{bid_id, status: QUEUED}`.
18 ms	Bid processor (partition 137)	Consumes message. Runs Lua CAS on `auction:a1b2`. Script reads `current_price=12400`, status=ACTIVE, `expected_price` matches. Increments `sequence_num` to 847, writes new `current_price=12450`, `high_bidder=jan`. Returns `{1, 847, 12450, end_time, OK}`. Time remaining 42 s, outside anti-snipe window.
22 ms	Bid processor	Inserts into `bids` table. Partial unique index `(auction_id, sequence_num) WHERE status='ACCEPTED'` (§7.2) confirms first write.
28 ms	Bid processor	Produces to `bids.accepted`. Commits Kafka offset.
30 ms	Broadcast gateway	Consumes `bids.accepted`. `PUBLISH auction:a1b2:updates` with payload `{seq: 847, price: 12450, high: "jan", end_time, time_left: 42}`.
32 ms	Valkey Pub/Sub	Fans out to 17 WebSocket gateway pods that have subscribed to this auction's channel.
35 ms	Each gateway pod	Writes a frame to every local connection watching this auction. ~500 total watchers, ~30 per pod avg.
65 ms	Watcher client	Receives frame, updates UI. "Outbid" notification fires for the previous high bidder.
140 ms	Client (jan)	Browser receives `bid_result: ACCEPTED, seq: 847` on its WebSocket. UI confirms the bid is live.

The p99 path is 200 ms. This one was 65 ms end-to-end because Valkey, Postgres, and Kafka were all warm and the user was in the auction's home region.

[4.6] Correctness guarantees

Protection layers, in order of the bid's lifetime:

API rate limit prevents one user from burying a partition.
Valkey CAS script serializes bids per auction and rejects stale expected_price.
Postgres UNIQUE (auction_id, sequence_num) dedupes Kafka redelivery.
Fencing token on auctions.settlement_fence prevents duplicate settlement commits.
Idempotency key at the payment provider prevents duplicate charges.

[4.7] Retraction and cancellation (cross-cutting)

Bid retraction is a legal requirement on many platforms (eBay allows it within rules). It invalidates an ACCEPTED bid and, if that bid is the current highest, forces the auction state to recompute.

Flow:

API writes UPDATE bids SET status = 'RETRACTED' WHERE id = ? AND bidder_id = ? and emits bids.retracted.
A retraction handler runs an atomic Lua script on the auction's Valkey state (same per-key serialization as bid acceptance), reads the top two bids from Postgres, and if the retracted bid was the current high, rolls current_price and high_bidder back within the same script.
Broadcast the correction: PUBLISH auction:{id}:updates <retraction+new_high>.

Retraction rules are business policy, not infrastructure: time windows, max retractions per auction, mandatory reason. Enforcement lives in the API validator.

Auction cancellation (seller withdraws a listing before bids arrive):

UPDATE auctions SET status = 'CANCELLED'
Valkey state hash deleted
Flink timer cancelled
Any in-flight bids reject on the next CAS attempt (the Lua script checks status)

[4.8] What is a "bid"?

bid_type routes the bid to one of three origin modes the processor knows:

Mode	Origin	Notes
manual (default)	Human click via API	Carries `expected_price` from client UI
proxy	Auto-bid fired by proxy resolver	Triggered by another user's bid crossing a standing max
dutch_accept	Dutch-auction "accept current price" click	No `expected_price`; price is taken from the scheduled drop

All three modes go through the same Kafka topic, the same CAS script, and the same Postgres writes. Only the caller differs.

From the bid processor's view, every bid is the same row shape. The origin mode only decides who issued it.

[4.9] What this design intentionally avoids

Every system-design deep dive picks a scope. Being explicit about what's out of scope sharpens what's in:

Sub-50 ms bid confirmation. Not an SLO. 200 ms p99 regional is the bar. Going lower means sacrificing durable bid persistence or crossing to a HFT-style architecture, neither of which fits the product.
Global real-time search. Browse is eventually consistent (Elasticsearch via CDC, <60 s lag). A user who places a bid and immediately searches by title may not find the listing for up to a minute. Acceptable.
Peer-to-peer bidding or on-chain settlement. Escrow, KYC, regulatory obligations require a central authority. The platform is the party to every transaction.
Exactly-once across the payment boundary. The payment provider is an independent authority. The platform sends idempotency keys; the provider's contract is what makes captures effectively-once.
Cross-region active-active writes. Auctions are pinned to one region for their lifetime. Multi-region failover is a 5-10 min manual operation, not automatic.
Live bid streaming to anonymous browsers. Watchers authenticate. No WebSocket without an account. Keeps abuse and bot scraping bounded.

[4.10] Store roles

Store	Technology	What it holds	Why it fits
Source of truth	Postgres 17	Auctions, bids, settlements, users	ACID, partitioning, proven at this write volume
Hot state	Valkey 8	Per-auction state hash, CAS target, sequence counter, fencing counter, Pub/Sub channels	Single-threaded Lua = free per-key serialization, sub-ms latency
Event bus	Kafka 4.0 (KRaft)	`bids.incoming`, `bids.accepted`, `auctions.ending`, `auctions.sold`	Partition ordering per `auction_id`, durable replay, mature ecosystem
Timer service	Flink 1.19	Keyed-state timers per auction, settlement pipeline	Exactly-once for internal state and timer firing; payment side effects made effectively-once via fencing + idempotency, not by Flink
Coordination	Postgres advisory lock	Settlement coordinator leader election	No extra service; etcd is the upgrade path if multi-region coordination is needed
Analytics	ClickHouse	Bid history aggregations, seller dashboards, trending	Columnar, fast over billions of rows
Search	Elasticsearch	Auction browse, faceted search, ending-soon lists	Full-text, geo, faceting
Objects	S3	Auction images, archived bid logs	Durable, cheap, CDN-friendly

5. Technology Selection

[5.1] What shape is this system?

[5.2] The simpler version (don't skip this)

Before building Kafka + Valkey + Flink, ask whether the scale requires it.

Postgres-only variant. Works up to ~500 bids/sec across all auctions.

Accept bids through an API that does SELECT ... FOR UPDATE on the auction row.
Validate the bid against current_price and current_end_time.
Insert into bids, update auctions.current_price, commit.
Broadcast via Postgres LISTEN/NOTIFY to a small fan-out service that pushes over WebSocket.
Settlement via pg_cron firing a SQL function at current_end_time.

When to graduate. The Postgres-only path falls over when:

Hot auctions exceed ~50 bids/sec (lock contention + connection pool saturation).
WebSocket watchers exceed ~10K (NOTIFY fan-out isn't designed for this).
Peak bid rate exceeds ~500/sec total (database becomes the bottleneck).

The rest of this post describes the full-scale version. Most teams building this will not need it on day one.

[5.3] Store selection

Concern	Chosen	Rejected
Source of truth	Postgres 17	CockroachDB (unnecessary global consistency overhead), MySQL (weaker partitioning story)
Hot auction state	Valkey 8	DynamoDB conditional write (5-10 ms vs sub-ms), Redis (license + Valkey is the forked OSS continuation)
Event bus	Kafka 4.0 KRaft	RabbitMQ (no partition ordering at this scale), Pulsar (viable alternative; see note below)
Timer service	Flink 1.19	Quartz (single-node, doesn't survive a crash), pg_cron (doesn't scale past the simpler variant)
Settlement coordinator leader	Postgres advisory lock	ZooKeeper (heavier), etcd (great, but unnecessary second service for single-region)

[5.4] Build vs buy

API gateway: build. Off-the-shelf gateways do not enforce the exact validation + idempotency + Kafka produce semantics required here.
Bid processor: build. Core of the system; no vendor substitute exists.
WebSocket gateway: build on a proven framework (Go + gorilla/websocket, or Rust + tokio-tungstenite). Do not hand-roll TCP framing.
Payment: buy. Stripe, Adyen, or equivalent. Never build a card-data system unless payments is the product.
Search: buy. Elasticsearch managed (Elastic Cloud, Opensearch on AWS).
Analytics: buy-or-self-host ClickHouse. ClickHouse Cloud for pure pain-avoidance; self-host when cost dominates.

6. Back-of-the-Envelope

[6.1] Throughput

Active auctions:                   10,000,000
Avg auction duration:              7 days
Completed per day:                 10M / 7 ≈ 1.43M
Mean bids per auction:             ~105 (median is ~15; hot listings pull the mean up)
Avg bid rate: 1.43M × 105 / 86400 ≈ 1,740 bids/sec → round to 1,700 bids/sec

Peak: 30× average driven by evening end-of-auction clustering.
Peak bid rate: 50,000 bids/sec

Hot-auction rate: top 0.01% of auctions in their final minute = 100-500 bids/sec per auction.
Daily volume: 1,700 × 86400 ≈ 147M bids/day, matching the 55B rows/year used in §6.3.

[6.2] Bid processor sizing

Target peak: 50,000 bids/sec
Per-consumer capacity (sequential): 180 bids/sec
Consumers needed: 50,000 / 180 ≈ 280
Round up for headroom and rebalance buffer: 400 partitions, 400 consumers

[6.3] Postgres storage

bids table:
  150M bids/day × 365 days = ~55B rows/year
  Row size: ~250 bytes (ids, amount, fencing/seq, timestamps, status)
  Hot 90 days: 13.5B rows × 250 B = 3.4 TB
  With indexes (2.5×): ~8.5 TB
  Monthly partitions: ~1 TB each
  Archive to S3 after 90 days; drop partition after 2 years

auctions table:
  10M active + ~50M archived per quarter = 60M rows
  Row size: ~2 KB with images JSONB reference (not content)
  Total: ~120 GB. Small relative to bids.

settlements table:
  1M settlements/day × 365 = 365M/year
  Row size: ~400 B
  Annual: 150 GB

users table: 50M × 500 B = 25 GB.

[6.4] Valkey memory

Active auction hash (per auction):
  Fields: current_price, high_bidder, min_increment, current_end_time, status, bid_count, reserve_price, auction_type, anti_snipe_*, sequence_num
  Size: ~500 B per hash
  10M × 500 B = 5 GB

Fencing counters: ~16 B per auction × 10M = 160 MB
Proxy sorted sets: ~500K auctions with active proxies × ~1 KB = 500 MB
Pub/Sub: ephemeral, ~no memory for idle channels
Other overhead: 500 MB

Total working set: ~7 GB
Cluster: 3 primaries + 3 replicas × 16 GB = 96 GB raw, 48 GB primary-side. ~6× headroom on primaries; replicas give the availability target.

[6.5] Kafka

bids.incoming:
  50K msg/sec × 500 B = 25 MB/sec
  400 partitions (one per processor)
  Retention: 24 hours
  Daily volume: 2.2 TB (pre-replication)

bids.accepted:
  ~40K msg/sec × 400 B = 16 MB/sec
  100 partitions (consumed in parallel by broadcast, search, analytics)
  Retention: 7 days
  Weekly volume: ~10 TB

auctions.ending, auctions.sold:
  ~1K msg/sec peak, low volume
  10 partitions each, 24 h retention

Cluster: 6 brokers, 3 TB NVMe each, RF=3

[6.6] WebSocket sizing

1M concurrent connections
Per-pod capacity: 50K connections. The 200K+ figure often quoted for Go + epoll
  assumes light payloads, kernel tuning (somaxconn, file descriptors, tcp_mem),
  terminated TLS at a sidecar, and ~10-20 KB memory per idle connection. Real
  headroom depends on TLS in-process, frame size, and per-connection subscription
  fan-out.
Pods needed: 1M / 50K = 20
Round up for headroom and rolling deploys: 40 pods

Per-bid fan-out cost:
  Avg 30 watchers per auction
  Accepted bid → Valkey PUBLISH → 1-40 subscribed pods → local push to watchers on each pod
  ~40K accepted/sec × 30 avg = 1.2M WebSocket frames/sec across the fleet
  Per pod: 1.2M / 40 = 30K frames/sec. Well within Go's easy envelope.

Hot auction edge case:
  5K watchers on one auction, distributed across all 40 pods
  → 125 watchers/pod avg
  → each pod pushes 125 frames per bid
  At 500 bids/sec on that one auction, each pod pushes 62.5K frames/sec from it.
  Total per pod: 100K frames/sec. Approaching the limit. See §15.2.

[6.7] Growth projections

The design above is sized for today. Three growth scenarios worth planning for:

Horizon	Multiplier	What breaks first	Mitigation
18 months	2× (100K bids/sec peak)	Kafka partition count (400 saturated). Valkey hot-key CPU on top 10 auctions.	Double partitions to 800 (provision up-front, rebalance painful). Shard Valkey cluster from 3 nodes to 6.
3 years	5× (250K bids/sec peak)	Postgres single-primary write rate on `bids` insert. WebSocket fan-out at 5M concurrent connections.	Shard `bids` by `auction_id` range across 4 Postgres primaries. Move WebSocket gateway to edge runtimes that hold stateful TCP (Cloudflare Durable Objects, Fly.io regional VMs). Standard CDN workers do not hold long-lived WebSockets.
5 years	10× (500K bids/sec peak)	The CQRS architecture itself: managing 10+ Postgres shards, 1600 partitions, 10M+ watchers exceeds what a single team can operate.	Split the platform by auction category (electronics, collectibles, vehicles), each a semi-autonomous deployment. Shared user and payment layer.

What to build in now vs later.

Build in: partition count hedge (800 partitions instead of 400), Valkey Cluster (not single-node), structured logging and tracing across all stages.
Defer: Postgres sharding, edge-deployed WebSocket, category splits. All three are >12 months of work; do them when the pain shows up, not on speculation.

7. Data Model

[7.1] `auctions` (source of truth)

sql

CREATE TABLE auctions (
    id                   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    seller_id            UUID NOT NULL REFERENCES users(id),
    title                VARCHAR(255) NOT NULL,
    description          TEXT,
    category_id          INT NOT NULL,
    auction_type         VARCHAR(20) NOT NULL DEFAULT 'english',
    -- Pricing
    starting_price       DECIMAL(12,2) NOT NULL,
    reserve_price        DECIMAL(12,2),
    min_bid_increment    DECIMAL(12,2) NOT NULL DEFAULT 1.00,
    current_price        DECIMAL(12,2) NOT NULL,
    high_bidder_id       UUID,
    bid_count            INT NOT NULL DEFAULT 0,
    -- Timing
    start_time           TIMESTAMPTZ NOT NULL,
    original_end_time    TIMESTAMPTZ NOT NULL,
    current_end_time     TIMESTAMPTZ NOT NULL,
    anti_snipe_seconds   INT NOT NULL DEFAULT 30,
    anti_snipe_extend    INT NOT NULL DEFAULT 120,
    -- Status + settlement
    status               VARCHAR(20) NOT NULL DEFAULT 'DRAFT',
    settlement_status    VARCHAR(20) DEFAULT 'PENDING',
    settlement_fence     BIGINT,             -- fencing token of latest settlement attempt
    winner_id            UUID,
    final_price          DECIMAL(12,2),
    -- Dutch-specific
    dutch_start_price    DECIMAL(12,2),
    dutch_decrement      DECIMAL(12,2),
    dutch_interval_sec   INT,
    -- Metadata
    region               VARCHAR(16) NOT NULL,  -- write-pinned region
    currency             CHAR(3) NOT NULL DEFAULT 'USD',  -- pinned at creation; no cross-currency bids
    image_urls           JSONB DEFAULT '[]'::JSONB,
    created_at           TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at           TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    CONSTRAINT valid_auction_type CHECK (auction_type IN ('english','dutch','sealed_bid')),
    CONSTRAINT valid_status CHECK (status IN (
        'DRAFT','SCHEDULED','ACTIVE','ENDING_SOON','EXTENDED',
        'CLOSED','SETTLING','SOLD','UNSOLD','CANCELLED')),
    CONSTRAINT valid_settlement CHECK (settlement_status IN (
        'PENDING','IN_PROGRESS','COMPLETED','FAILED','NO_SALE')),
    CONSTRAINT valid_time_range CHECK (start_time < original_end_time)
) PARTITION BY RANGE (created_at);

CREATE INDEX idx_auctions_status_end ON auctions (status, current_end_time)
    WHERE status IN ('ACTIVE','ENDING_SOON','EXTENDED');
CREATE INDEX idx_auctions_settlement ON auctions (settlement_status)
    WHERE settlement_status = 'PENDING' AND status = 'CLOSED';
CREATE INDEX idx_auctions_seller ON auctions (seller_id, status);

[7.2] `bids` (one row per bid, partitioned)

sql

CREATE TABLE bids (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    auction_id      UUID NOT NULL,               -- FK enforced in app layer (cross-partition cost)
    bidder_id       UUID NOT NULL,
    amount          DECIMAL(12,2) NOT NULL,
    previous_price  DECIMAL(12,2) NOT NULL,       -- price seen at CAS time
    sequence_num    BIGINT,                       -- per-auction monotonic; NULL for rejected bids
    status          VARCHAR(20) NOT NULL DEFAULT 'ACCEPTED',
    bid_type        VARCHAR(20) NOT NULL DEFAULT 'manual',
    rejection_reason VARCHAR(32),
    idempotency_key  VARCHAR(128),                -- client-supplied
    server_ts       TIMESTAMPTZ NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    is_proxy        BOOLEAN NOT NULL DEFAULT false,
    proxy_max       DECIMAL(12,2),
    CONSTRAINT valid_bid_status CHECK (status IN ('ACCEPTED','REJECTED','RETRACTED')),
    CONSTRAINT valid_bid_type   CHECK (bid_type IN ('manual','proxy','dutch_accept')),
    CONSTRAINT positive_amount  CHECK (amount > 0),
    -- Partition key included so the constraint holds across partitions (Postgres requirement).
    CONSTRAINT uq_seq  UNIQUE (auction_id, sequence_num, created_at),
    CONSTRAINT uq_idem UNIQUE (bidder_id, idempotency_key, created_at)
) PARTITION BY RANGE (created_at);

CREATE INDEX idx_bids_auction ON bids (auction_id, sequence_num DESC);
CREATE INDEX idx_bids_bidder  ON bids (bidder_id, created_at DESC);
CREATE INDEX idx_bids_accepted ON bids (auction_id, amount DESC)
    WHERE status = 'ACCEPTED';
-- Per-partition partial unique index on accepted bids. A single auction never straddles
-- more than two weekly partitions in practice (max 30-day duration), so gap-free
-- sequence_num is enforced at the app layer by the Valkey CAS and verified by this index.
CREATE UNIQUE INDEX idx_bids_accepted_seq ON bids (auction_id, sequence_num)
    WHERE status = 'ACCEPTED' AND sequence_num IS NOT NULL;

Weekly partitions. Drop old partitions to archive path after 90 days. Rejected bids carry sequence_num = NULL (the Valkey CAS only assigns a sequence on acceptance).

[7.3] `proxy_bids`

sql

CREATE TABLE proxy_bids (
    id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    auction_id     UUID NOT NULL,
    bidder_id      UUID NOT NULL,
    max_amount     DECIMAL(12,2) NOT NULL,
    is_active      BOOLEAN NOT NULL DEFAULT true,
    created_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deactivated_at TIMESTAMPTZ,
    CONSTRAINT positive_max CHECK (max_amount > 0)
);

-- Partial unique index: one active proxy per (auction, bidder). Withdrawn proxies
-- (is_active = false) do not block the user from setting a new one.
CREATE UNIQUE INDEX uq_active_proxy ON proxy_bids (auction_id, bidder_id)
    WHERE is_active = true;
CREATE INDEX idx_proxy_active ON proxy_bids (auction_id)
    WHERE is_active = true;

[7.4] `auction_settlements`

sql

CREATE TABLE auction_settlements (
    id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    auction_id       UUID NOT NULL UNIQUE,
    winner_id        UUID,
    final_price      DECIMAL(12,2),
    reserve_met      BOOLEAN NOT NULL DEFAULT false,
    fencing_token    BIGINT NOT NULL,           -- INCR fence:auction:{id}
    status           VARCHAR(20) NOT NULL DEFAULT 'INITIATED',
    payment_id       VARCHAR(128),
    payment_status   VARCHAR(20) DEFAULT 'PENDING',
    idempotency_key  VARCHAR(128) NOT NULL,    -- settle-{auction_id}  (stable across retries; see §9.6)
    auction_ended_at TIMESTAMPTZ NOT NULL,
    settled_at       TIMESTAMPTZ,
    payment_at       TIMESTAMPTZ,
    CONSTRAINT valid_settlement_status CHECK (status IN (
        'INITIATED','WINNER_CONFIRMED','PAYMENT_AUTHORIZED',
        'PAYMENT_CAPTURED','COMPLETED','FAILED','NO_SALE'))
);

CREATE INDEX idx_settlements_status ON auction_settlements (status)
    WHERE status NOT IN ('COMPLETED','NO_SALE');

[7.5] Valkey key patterns

auction:{id}                  HASH  current_price, high_bidder, min_increment,
                                    current_end_time, status, bid_count,
                                    reserve_price, auction_type,
                                    anti_snipe_seconds, anti_snipe_extend,
                                    sequence_num

fence:auction:{id}            INT   settlement fencing counter

bid_result:{bid_id}           STRING cached CAS result for Kafka-redelivery dedup;
                                    TTL = auction_end + 48 h (see §17.1)

auction:{id}:proxies          ZSET  score=max_amount, member=bidder_id

auction:{id}:updates          PUB/SUB channel for bid broadcast
                                    (SPUBLISH/SSUBSCRIBE on Valkey Cluster)

user:{id}:watching            SET   auction_ids the user is watching

ws:{pod_id}:subscriptions     SET   auction_ids this gateway pod is actively subscribed to
                                    (written by the gateway on subscribe, read by ops tooling)

rate:bid:{user_id}            STRING counter with TTL (rate limit)

[7.6] Entity-relationship diagram

[7.7] Auction lifecycle

8. API Design

[8.1] Place a bid

POST /api/v1/auctions/{auction_id}/bids
Authorization: Bearer <token>
Idempotency-Key: bid-20260419-user123-a1b2c3d4-10500
Content-Type: application/json

{ "amount": 105.00, "expected_price": 100.00 }

On rejection:

json

{ "bid_id": "...", "status": "REJECTED",
  "reason": "STALE_EXPECTED_PRICE",
  "current_price": 107.00, "min_next_bid": 108.00,
  "end_time": "2026-04-19T20:00:30Z" }

[8.2] Set a proxy bid

POST /api/v1/auctions/{auction_id}/proxy-bids
{ "max_amount": 500.00 }

DELETE /api/v1/auctions/{auction_id}/proxy-bids    # withdraw

Proxy bid submission immediately fires a real bid if the current price × min_increment is below max_amount.

[8.3] Create auction

POST /api/v1/auctions
{
  "title": "...", "description": "...", "category_id": 42,
  "auction_type": "english",
  "starting_price": 1.00, "reserve_price": 100.00,
  "min_bid_increment": 1.00,
  "start_time": "...", "end_time": "...",
  "anti_snipe_seconds": 30, "anti_snipe_extend": 120,
  "images": ["..."]
}

[8.4] WebSocket protocol

Client connects to wss://<region>.auction.example/v1/ws, authenticates, and subscribes to channels:

→ { "op": "subscribe", "auction_ids": ["a1", "a2"] }
→ { "op": "unsubscribe", "auction_ids": ["a1"] }
→ { "op": "resume", "auction_id": "a1", "last_seen_seq": 23 }

← { "type": "bid", "auction_id": "a1", "seq": 24, "price": 105.00,
     "high_bidder": "jan", "end_time": "...", "extended": false }
← { "type": "bid_result", "bid_id": "b1", "status": "ACCEPTED", "seq": 24 }
← { "type": "bid_result", "bid_id": "b1", "status": "REJECTED",
     "reason": "STALE_EXPECTED_PRICE", "current_price": 107.00 }
← { "type": "auction_closed", "auction_id": "a1", "result": "SOLD",
     "winner": "jan", "final_price": 240.00 }

On reconnect, the client sends resume with the last sequence it saw. The gateway fetches missing bids from Postgres and replays them before resuming the live stream.

[8.5] Ops endpoints

GET  /api/v1/auctions/{id}                     current state (served from Valkey with Postgres fallback)
GET  /api/v1/auctions/{id}/bids?cursor=...     paged bid history (Postgres)
GET  /api/v1/auctions?category=&ending=soon    browse via Elasticsearch
POST /api/v1/auctions/{id}/bids/{bid_id}/retract   policy-gated

[8.6] Image and media pipeline

Auction images dominate the object-store footprint and the CDN bill. Pipeline:

Seller uploads direct to S3 via presigned PUT. The API only returns the URL; bytes never touch application servers.
On s3:ObjectCreated, a Lambda generates three thumbnail sizes (200px, 600px, 1200px WebP) and a blurhash string. Thumbnails write to a public bucket behind CloudFront.
Moderation runs in parallel: an async worker calls a vision model (AWS Rekognition or equivalent) for NSFW + weapon + known-counterfeit signals. Flagged images block the auction from publishing until human review.
A perceptual-hash (pHash) is computed and indexed. On create, the hash is compared against a stolen-listing denylist and against the seller's own past listings (duplicate-image reuse is a common fraud signal).
auctions.image_urls stores the S3 keys; the client builds CDN URLs with a signed short-TTL token for listings under legal hold.

Retention: auction images are kept 7 years to satisfy dispute windows; archived to Glacier after 90 days post-settlement.

9. Settlement, Payouts, and Risk

Settlement correctness (§9.1-9.10), deposit policy (§9.11), account lifecycle (§9.12), and seller payouts (§9.13) all hang off the same auction-end event.

[9.1] Core idea

[9.2] Real-world duplicate scenario

A settlement consumer fires on auctions.ending. It:

Acquires fencing token 42 via INCR fence:auction:{id}.
Writes the winner + settlement_fence = 42 to Postgres.
Calls Stripe with Idempotency-Key: settle-abc-42. Stripe captures $240.
Updates settlement_status = 'PAYMENT_CAPTURED'.

Between step 3 and step 4, the consumer pod gets evicted. Kafka redelivers. A second consumer picks up:

Acquires fencing token 43.
Writes the winner + settlement_fence = 43 to Postgres (42 < 43, so the conditional UPDATE succeeds).
Calls Stripe with Idempotency-Key: settle-abc-43. New idempotency key. Stripe would capture again.

This is the subtle trap. Using the fencing token in the idempotency key breaks the guarantee. The key must be stable across retries, tied to the auction, not the attempt.

[9.3] Why this isn't exactly-once

[9.4] Settlement flow (order matters)

Correct version of the flow:

token = INCR fence:auction:{id} in Valkey. This is the tiebreaker.
Read the winning bid from Postgres: SELECT id, bidder_id, amount FROM bids WHERE auction_id = ? AND status = 'ACCEPTED' ORDER BY amount DESC, sequence_num ASC LIMIT 1.
If no winning bid or amount < reserve_price, mark UNSOLD and stop. No payment call.
INSERT INTO auction_settlements (auction_id, winner_id, final_price, fencing_token, status, idempotency_key) VALUES (?, ?, ?, $token, 'INITIATED', 'settle-{auction_id}') with ON CONFLICT (auction_id) DO UPDATE SET fencing_token = EXCLUDED.fencing_token, status = 'INITIATED' WHERE auction_settlements.fencing_token < EXCLUDED.fencing_token. The idempotency key is derived from auction_id only. Stable across retries.

Conditional UPDATE on auctions:

sql

UPDATE auctions
SET status = 'SOLD', winner_id = $winner, final_price = $price,
    settlement_fence = $token
WHERE id = $auction_id
  AND (settlement_fence IS NULL OR settlement_fence < $token)
  AND status = 'CLOSED';

Zero rows updated means a newer attempt has already committed. Stop immediately.

Call the payment provider with Idempotency-Key: settle-{auction_id}. Stripe returns the original capture on repeat.
On provider success, UPDATE auction_settlements SET status = 'PAYMENT_CAPTURED', payment_id = ?, payment_at = NOW() WHERE auction_id = ? AND fencing_token = $token.
Emit auctions.sold.

The fencing token orders the writes. The idempotency key is independent of the token.

[9.5] Why the CAS alone isn't enough

[9.6] External idempotency (payment provider)

[9.7] Retraction mid-settlement

[9.8] Where duplicates are handled

Stage	Duplicate source	Defense
Kafka `bids.incoming` delivery	At-least-once + rebalance	`bid_result:{bid_id}` NX cache inside Lua (§17.1); partial unique index on `(auction_id, sequence_num) WHERE status = 'ACCEPTED'`
Bid client retry	Network blip	`UNIQUE (bidder_id, idempotency_key, created_at)` on bids
Settlement retry	Coordinator crash	Fencing token + conditional UPDATE
Payment retry	Settlement retry	`Idempotency-Key: settle-{auction_id}` (stable across retries)
WebSocket push redelivery	Pub/Sub fan-out bug	Client dedupes by `sequence_num`

[9.9] Settlement is not atomic

Crash after INCR, before Postgres write → next attempt acquires a higher token, same outcome.
Crash after Postgres write, before payment → next attempt sees a committed settlement with status = 'INITIATED', sends the payment call (same idempotency key), updates status.
Crash after payment, before Kafka emit → next attempt re-reads the settlement row, sees status = 'PAYMENT_CAPTURED', and emits.

Every stage of the pipeline is idempotent in terms of its side effect.

[9.10] Guarantees and non-guarantees

Guaranteed:

A single auctions row ends in status SOLD per auction (after retries settle).
A single auction_settlements row per auction, carrying the highest fencing token observed.
At most one captured charge at the payment provider (assuming the provider honors its idempotency contract).
No accepted bid is ever lost.

Not guaranteed:

Strict wall-clock bound on settlement time. Settlement is best-effort low-latency (sub-second under normal conditions) but can take minutes under backpressure.
Atomicity across the payment boundary. If the payment provider captures the charge but the platform fails to persist PAYMENT_CAPTURED, a reconciliation job corrects it.

[9.11] Bidder risk tiers and payment holds

Hold by item value.

Value	Pre-bid action
< $100	None. Card on file at signup is enough. Charge post-win.
$100-$1000	Card verification + saved payment token. No hold.
$1000-$10K	Preauth hold for the bid amount. Released on outbid.
> $10K	Refundable deposit (5-10%) + KYC. Manual review for new accounts.

Hold by bidder tier.

Tier	Who	Action
A	>10 settled wins, no chargebacks, account >90 days	Default policy by item value (above)
B	New account or <3 wins	One tier stricter than the value table says
C	Past chargeback, dispute, or fraud flag	Deposit always required; manual approval over $1K

Tradeoff. Every dollar of friction costs conversion. New-user deposit thresholds are tuned quarterly against chargeback rate, not set once.

[9.12] Account deletion mid-auction

User deletion requests interact badly with an append-only bid log and a settlement that happens days later. Two cases.

Bidder deletes account while holding high bid.

Hard-delete would orphan the auction (no winner_id to pay). Solution:

Deletion is soft while the user has active bids or pending settlements: users.status = 'PENDING_DELETION'. The account cannot log in, cannot place new bids, but the user_id remains valid for settlement.
Pending settlements proceed normally. The charge fires against the last-known payment method.
After the final open settlement completes, a janitor job anonymizes the record (email, name, address → hashed or nulled) but keeps user_id for audit. The bid row keeps the anonymized bidder_id. Audit log preserved, PII gone.
GDPR Article 17 "right to erasure" is satisfied by the anonymization step. Article 17(3)(b) exempts data retained for legal-defense purposes; bid history qualifies.

Seller deletes account mid-auction.

[9.13] Seller payout flow

The post so far stops at "charge the winner." The seller's side of the ledger needs its own lifecycle.

Stage	Trigger	What happens
Pending	`auctions.sold` emitted	`seller_payouts` row created: `amount = final_price - fees`, `status = 'PENDING_HOLD'`, `available_at = now + chargeback_window` (7 days default, 30 days for new sellers).
Held	Continuous	Funds sit in the platform's Stripe Connect balance. Winner has dispute rights during the hold.
Disbursed	Cron job after `available_at`	Transfer via Stripe Connect `transfers.create` to seller's connected account. Idempotency key: `payout-{auction_id}`.
Clawback	Chargeback within hold window	Payout cancelled. `seller_payouts.status = 'CLAWED_BACK'`. Seller account balance shows negative if funds were partially released.
Settled	Payout completed + no dispute	`status = 'SETTLED'`. 1099-K tax form generated at year-end if US seller crosses threshold.

Failure: seller deletes account after sale. Funds owed are held for 12 months per most platform T&Cs; if unclaimed, they are escheated per state law (US) or become platform revenue per EU terms.

10. Bid Processing Model

[10.1] Why optimistic concurrency, not pessimistic

[10.2] The CAS script, step by step

The script at §4.2 step 2 is the entire acceptance logic in one atomic operation. Steps in order:

Read current state from the auction hash.
Check auction is ACTIVE and now ≤ current_end_time.
Check bid ≥ current + min_increment.
Check expected_price matches current (this is the stale-view rejection).
Increment sequence_num.
Update current_price, high_bidder, current_end_time (extension).
Return the accept/reject result with the authoritative state.

All seven steps execute without interruption. No other CAS can see a half-applied state.

[10.3] Proxy bid resolution

Proxy bidding is an auto-bid-until-max agent. The resolver computes a jump-to price that ends the race in a single bid rather than cascading $1 at a time.

Algorithm on every bids.accepted for auction A:

Load active proxy_bids for A where max_amount > current_price.
Exclude the current high_bidder's own proxy.
If none remain, stop.
Let winner = the proxy with the highest max_amount; runner_up = second highest (or the non-proxy current_price if only one proxy remains).
jump_price = min(winner.max_amount, runner_up.max_amount + min_increment).
Submit one bid at jump_price on winner's behalf through the same Kafka → CAS → Postgres path.

Worked example. Two proxies with maxes $200 and $500 on an auction starting at $10:

A user bids $10 to enter. bids.accepted fires.
Resolver picks user-500 (highest max) vs user-200 (runner-up).
jump_price = min($500, $200 + $1) = $201.
User-500's proxy submits one bid at $201. Accepted. Done.
Final state: price $201, high_bidder = user-500. One Kafka message, one CAS, one broadcast.

Edge cases:

Two proxies with the same max. Both compute the same jump_price = max. Tie is broken by earliest proxy_bids.created_at: that user's bid is submitted first, the later proxy's max no longer exceeds the new current_price, and resolution terminates.
Proxy set mid-auction. If the new max is above the current price, the resolver runs with the new proxy as a candidate. Otherwise the proxy is dormant until someone else bids.
Proxy withdrawal. UPDATE proxy_bids SET is_active = false. The resolver checks the partial index (§7.3) before firing.
Late incoming manual bid during resolution. The CAS expected_price check rejects the resolver's bid if a manual bid landed first; the resolver re-runs on the next bids.accepted.
Resolver racing the auction close. A proxy bid submitted just before current_end_time elapses: the Lua script checks now > current_end_time and rejects with AUCTION_CLOSED. The proxy does not fire a "would have won" bid after the close. The cascade terminates cleanly at the timer firing.
Resolver in flight when timer fires. Flink's timer emits auctions.ending and the settlement consumer starts reading winners. If a resolver bid lands after status = SETTLING, the Lua script rejects on status check (status is no longer ACTIVE). Serialization is implicit: bid acceptance and settlement read Postgres/Valkey state atomically via the CAS + fencing token.

[10.4] Dutch auction

Price starts high and drops on a schedule. First bidder to "accept" wins at the current price.

Base flow.

Flink emits a price-drop tick every dutch_interval_sec that runs an atomic script to reduce current_price by dutch_decrement.
The price-drop script also publishes to auction:{id}:updates so watchers see the drop.
An "accept" bid goes through the same CAS path. The script's expected_price check prevents accepting a stale price (if the price dropped between the client's render and the click, the CAS rejects and the UI updates).

Dutch end behavior. If no one accepts before the floor price, the auction closes UNSOLD. Most Dutch auctions set a floor equal to the seller's reserve; it's unusual to drop below reserve.

[10.5] Sealed-bid variants

All sealed-bid auctions share one property: bids are blind until close. The winner-selection logic differs.

First-price sealed-bid (default). Highest bid wins at their bid amount. Simple. Encourages underbidding because bidders try to guess what others will bid.

sql

-- First-price: pay your own bid
SELECT amount FROM bids WHERE auction_id = ? AND status = 'ACCEPTED'
ORDER BY amount DESC LIMIT 1;

-- Vickrey (second-price): pay the runner-up's bid
SELECT amount FROM bids WHERE auction_id = ? AND status = 'ACCEPTED'
ORDER BY amount DESC LIMIT 1 OFFSET 1;

Tiebreak. Identical high bids: earliest sequence_num wins. This is also how English handles ties, but sealed-bid sees them more often because bidders don't see each other's moves.

11. Timers and Anti-Sniping

[11.1] Timer service

sql

SELECT pg_try_advisory_lock(91823746);
-- Held for the session lifetime; releases on disconnect.

[11.2] Anti-sniping extension semantics

anti_snipe_seconds = 30: the trigger window.
anti_snipe_extend = 120: the extension.
When bid arrives && (end_time - now) < 30, set end_time = end_time + 120.

Extension is relative to the current end time, not the original. Rapid-fire bids near end can compound extensions until no bid arrives in the final window.

Infinite-extension attack. A bot placing one bid every 29 seconds (inside the 30 s window) keeps the auction open forever. Two caps prevent this:

max_extensions (default 20). Once hit, anti-sniping stops firing. Bids still accepted, but the auction ends at the current current_end_time regardless of how close the bid lands.
absolute_end_time = original_end + 30 minutes. Hard ceiling. The CAS script checks new_end > absolute_end_time and clamps. Beyond 30 min past the original close, no more extensions.

[11.3] Timer re-arming

[11.4] Clock skew

[11.5] Testing the concurrency model

The CAS + fencing + idempotency stack is hard to read from code alone. Tests that run before any change ships:

Kafka rebalance fault injection. Kill a bid processor mid-consume while a hot auction is taking 500 bids/sec; assert no double-accept, no dropped bid, no sequence_num gap.
Valkey primary failover chaos. Force a Sentinel-promoted replica while CAS is in flight; assert the client sees either ACCEPTED or REJECTED, never both, and the bid_result cache survives.
Settlement replay against prod snapshot. Copy a week of settled auctions, reset auctions.status and auction_settlements, re-run the pipeline with the payment provider in test mode; assert one SOLD row per auction and one capture per idempotency key.
Redelivery torture. Force Kafka to redeliver every bids.incoming message 3×; assert the bid_result:{bid_id} dedup makes outcomes identical to a single-delivery run.

12. Hot Auctions and Fair Queueing

[12.1] The celebrity auction problem

Burst handling. When a single partition sees a 10× spike:

The bid processor for that partition cannot scale horizontally (one consumer per partition).
p99 latency rises from 60 ms to ~500 ms for bids on that partition.
Users see a slow confirmation, not a rejection.

If a hot auction consistently overwhelms its partition, the auction is moved to a dedicated partition. Rare; most celebrity-driven bursts are short-lived.

[12.2] Per-user rate limit

One bidder cannot fire more than 10 bids/sec on one auction. Valkey counter with TTL:

INCR rate:bid:{user_id}:{auction_id}
EXPIRE rate:bid:{user_id}:{auction_id} 1
-- reject if count > 10

This protects the CAS script from a single misbehaving client. Legitimate rapid bidding (proxy cascades) comes from internal services that use a separate quota pool.

[12.3] Noisy-neighbor protection layers

Layer	Mechanism	Where it runs
Per-user API rate limit	Sliding window by `user_id`	API gateway
Per-auction burst control	Kafka partition quotas (`quota.consumer_byte_rate`)	Kafka broker
Bid processor scheduling	Single consumer per partition; fair scheduling within	Bid processor
Pub/Sub fan-out cost	Bounded subscriber count per pod	WebSocket gateway

13. Auction Search and Ranking

[13.1] What buyers actually query

Three queries dominate traffic:

Category browse. "Show me electronics, sorted by ending soon."
Free-text search. "Vintage rolex submariner."
Saved-search alerts. "Notify me when a Honda CB750 listing under $5,000 is posted."

Everything else (advanced filters, geo) is long-tail.

[13.2] Ranking signals

The ranking function combines static and live signals. Static signals come from Postgres CDC; live signals come from bids.accepted.

Signal	Source	Update cadence	Weight
`time_remaining_sec`	Live (computed at query time)	per query	High for "ending soon" sort
`bid_count`	`bids.accepted` stream	seconds	Medium (proxy for interest)
`watcher_count`	Watchlist subscribe events	seconds	Medium
Text relevance (BM25)	Title + description tokens	At index time	High for free-text
Category match	`category_id` exact	At index time	High
Seller reputation	Postgres CDC	hourly	Medium
Promoted boost	`auctions.promoted_until`	hourly	High when active
Image quality score	Image moderation pipeline	At upload	Low (tiebreaker)

[13.3] Ending-soon ranking

The most-clicked surface. Buyers want to see auctions ending in the next 1-24 hours, sorted by remaining time.

Naive approach. ORDER BY current_end_time ASC LIMIT 50 against Elasticsearch. Works but the head of the list is dominated by no-bid junk listings.

[13.4] Free-text relevance

Standard Elasticsearch BM25 on title^3, description^1, brand^2. Title boost is highest because auction titles are short and dense.

Synonyms. A small synonym set per category: "rolex" / "rolex watch" / "submariner" / "sub" all match. Maintained by an editorial team; ~5K synonyms total.

Spell correction. phrase_suggester with a 1-edit-distance budget. Corrections are surfaced as "did you mean" without auto-replacing the query.

[13.5] Promoted listings

Sellers can pay to boost a listing in search results. Implementation:

auctions.promoted_until TIMESTAMPTZ field; when present and > now, a fixed score boost is added.
Promoted listings are clearly labeled in the UI (legal requirement in many regions).
The boost saturates: at most 2 promoted slots per page, after which others rank organically. Prevents pay-to-rank-above-quality.

Revenue from promoted listings is tracked separately for billing.

[13.6] Index pipeline

Index lag target: <60 s p99 from auction creation to first searchable. Bid-count freshness target: <10 s p99.

[13.7] Failure modes

What	Effect	Mitigation
Elasticsearch cluster down	Search returns 503; browse falls back to a static "popular" page from CDN	Multi-AZ ES cluster with 3 master nodes; fallback page refreshed hourly
Index lag spike (>5 min)	New listings invisible	Auto-page on indexer lag; pause promoted-listing billing during incident
Bad index update (mapping change)	Field type errors on query	Blue-green index strategy: build new index in parallel, swap alias atomically
Search abuse (scrapers)	Inflated query load, ranking distortion	Per-IP rate limit; bot-detection on User-Agent + click-through ratio

14. Multi-Region

[14.1] Write locality

[14.2] Cross-region reads

A reader in EU sees the auction detail from the EU Postgres replica. When they click "bid," the request routes to the auction's home region (say US-East) for processing.

[14.3] Region failover

If the auction's home region goes fully dark:

Read side: other regions continue serving browse and detail traffic from their replicas.
Write side: bids for that region's auctions reject with 503. There is no automated failover of write ownership; it would require durable cross-region state replication that would slow the hot path. Manual failover is a 5-10 min RTO: promote the replica to primary, update the auction_id → region table, resume traffic.

15. Bottlenecks and Backpressure

[15.1] Hot auction CAS contention

[15.2] WebSocket fan-out amplification

A hot auction with 5K watchers produces 5K frames for every accepted bid. At 500 bids/sec, that is 2.5M frames/sec for one auction. Across 40 pods, ~62.5K frames/sec per pod from this auction alone.

Mitigation:

Batching: combine multiple bid updates within 100 ms into one WebSocket frame for quiet connections.
Coalescing: if two bids arrive within 50 ms, the second overwrites the first in the outbound buffer (the client only needs the latest state).
Shedding: slow connections (client ack lag > 5 s) are disconnected. The client will reconnect and resume from Postgres.

[15.3] Kafka consumer lag

[15.4] Partition-bound parallelism

400 Kafka partitions = 400 concurrent consumers, period. Adding a 401st pod does nothing. Growing past 50K bids/sec requires adding partitions, which rebalances the consumer group and migrates state.

Plan for 2× growth: provision 800 partitions up front. The extra ones cost little in Kafka, and the option to scale is worth the future pain avoided.

[15.5] Postgres bid insert rate

50K inserts/sec into a single bids table is beyond one Postgres primary (typical ceiling: 15-25K/sec for narrow rows on NVMe). Mitigation:

Partitioning by created_at spreads current week's inserts across one weekly partition; older partitions see no writes.
Batched inserts. COPY or multi-row INSERT from the bid processor, 10-50 bids per Postgres round trip.
Async write. The processor can return CAS acceptance to the client before the Postgres insert completes. Postgres insert is re-driven on failure from Kafka bids.accepted.

Alternative: shard bids by auction_id across multiple Postgres primaries. Operational complexity; avoided unless scale demands it.

[15.6] Backpressure: three-layer admission control

Cause (§15.1-15.5) and response belong together. When a bottleneck flares, admission control sheds load before it becomes user-visible.

Kafka consumer lag. If bids.incoming lag exceeds 30 s for 60 s, the API gateway returns 503 for new bids on that region. High-value auctions (top 1%) still accepted; others shed.
Bid processor loop time. If CAS + persist p99 exceeds 100 ms for 2 min, the API reduces per-user rate limits to 5 bids/sec (from 10).
Postgres write latency. If bids insert p95 exceeds 20 ms, the processor drops to async-write mode and batches inserts at 50/batch.

Each response is reversible on recovery. None require a deploy.

16. Retries, Fraud, and Recovery

[16.1] Client retry semantics

[16.2] Fraud detection

Pattern: shill bidding (seller pumps own auction).

Seller creates a sock-puppet account and bids on their own listing to drive the price up.

Inline signals. Same device fingerprint, same browser font hash, same IP /24, same payment-method fingerprint between bidder and seller account.
Offline signals. Account graph: bidder who only bids on one seller's auctions, never wins, never pays. Payment-method reuse across "different" accounts.
Action. Inline: flag and route to bids.quarantine. Offline: account suspension + refund affected winners + ban payment method.

Pattern: bot bidding (high-frequency automated bids).

Bots that bid in the final 100 ms of an auction, often repeatedly across thousands of listings.

Inline signals. Bid timing variance below human reaction floor (<150 ms between successive bids), inhuman click coordinates from the same client, missing or replayed CSRF tokens.
Offline signals. Per-account bid-rate distribution clustering far above the cohort median.
Action. Inline: rate-limit hard, then CAPTCHA challenge. Offline: account ban if pattern persists post-challenge.

Pattern: bid retraction abuse.

Bidder bids high to scare off competition, then retracts at the last moment to win at the previous lower price.

Inline signals. Retraction within X minutes of placing a bid that became the high bid.
Offline signals. Retraction frequency per bidder (>5% of bids retracted), retractions clustered near auction end.
Action. Hard cap on retractions per bidder per month; flag-and-review if hit.

Pattern: collusion / bid cartels.

Coordinated group rotating winners across high-value categories to keep prices artificially low.

Offline signals. Graph clustering on co-bidder pairs (accounts that consistently appear together but never outbid each other in the final stretch). Network analysis on shared shipping addresses, payment instruments, login geolocation.
Action. Manual investigation; bans propagate across all linked accounts in one go.

Pattern: account takeover / fraudulent winning bids.

Compromised account places massive bid, abandons payment, leaves seller with no real winner.

Inline signals. Login from new geolocation + first bid above N× the account's historical average + new payment method added in last 24 h.
Action. Step-up authentication (SMS / email confirm) before the bid commits to Kafka.

Detection pipeline.

[16.3] Seller-side fraud and delivery disputes

Buyer-side fraud (§16.2) is only half the surface. Sellers can ship nothing, ship counterfeits, or list stolen goods. Defenses run at three points in the auction lifecycle.

At listing creation.

Perceptual-hash check against the platform's stolen-goods denylist (§8.6).
Seller account risk tier: new sellers cannot list above a category cap until they complete a first successful sale plus KYC.
High-value categories (watches, electronics > $5K) require photo-with-serial or authenticator-partner verification before the listing goes live.

Seller uploads tracking proving delivery: claim closes, payout releases.
Buyer returns the item with return-tracking: refund fires against the original payment_intent_id with Idempotency-Key: refund-{auction_id}.
Counterfeit confirmed by authenticator partner: seller account banned, funds clawed back, refund fires, listing pHash added to the stolen-goods denylist.

The claim window is 30 days from settlement by default, extended to 90 days for items > $10K. All transitions write to audit_log with actor and reason.

[16.4] Payment failures

Winner's payment method declines. Flow:

Settlement captures the auction but payment returns card_declined.
auction_settlements.status = 'PAYMENT_FAILED'.
Notification to the winner: "Your payment failed, please update your payment method within 48 h."
48 h timer fires; if still failing, the settlement is marked ABANDONED, the auction is offered to the second-highest bidder (if reserve is still met).
Second-chance offer via email + dashboard. If accepted, settlement restarts with a new fencing token.

17. Failure Scenarios

[17.1] Bid processor crashes mid-bid

Processor consumed a message, ran the CAS, but crashed before persisting to Postgres.

[17.2] Valkey node failure during CAS

Valkey primary dies during a Lua script execution.

Effect. The script may or may not have committed its changes to memory. AOF fsync = everysec means up to 1 s of writes can be lost on hard failure.

Recovery. Valkey Sentinel or Cluster promotes a replica. The replica has the state as of the last replication lag (~1-10 ms for local cluster).

[17.3] Flink checkpoint failure during settlement

Flink checkpoint fails while a settlement job is mid-flight.

Effect. On recovery, Flink replays events from the last successful checkpoint. The settlement event for this auction may fire twice.

[17.4] WebSocket gateway pod crashes

Pod holding 50K connections dies.

Effect. 50K clients see disconnect. They reconnect to another pod (sticky-session LB routes to healthy pods).

Recovery. On reconnect, each client sends resume with its last seen sequence. The new pod fetches missing bids from Postgres. Service restored in seconds per client.

[17.5] Postgres primary failover during peak bidding

Postgres primary dies. Streaming replica promotes via Patroni; failover takes 20-60 s depending on health-check cadence and connection drain.

Effect. Bid processors cannot persist during those 30 s. Valkey CAS continues (Valkey is independent). Accepted bids pile up in a pending_persist queue in the bid processor.

[17.6] Kafka broker outage

One broker out of six dies. Replication factor 3 means each partition has two survivors. Kafka rebalances leadership within seconds.

Effect. Sub-second p99 latency spike on produce for partitions whose leader was on the dead broker. No message loss.

Recovery. Automatic. Dead broker replaced and caught up via replica fetch.

[17.7] Region outage

Auction's home region goes dark (power, network, provider outage).

Effect. Bids for auctions pinned to that region fail with 503. Other regions unaffected for their auctions.

[17.8] Payment provider outage

Payment provider is down or degraded for 30-120 minutes. Rare but not theoretical: every major PSP has shipped multi-hour incidents at some point.

Inline response.

Circuit breaker on the payment client: open after 20% failure rate over 60 s, half-open retries every 30 s. Prevents piling up retries against a dead provider.
Queue depth check: if auction_settlements in INITIATED grows past 1000, page ops. Capture is deferred but safe; winners wait longer for the "won" confirmation.
Email winners: "Your payment is being processed. If it fails, we will retry automatically for 48 hours." (literal user-facing copy)

18. Operational Playbook

[18.1] Deployment

Bid processor: rolling deploy, 10% of pods at a time. Partition reassignment on each pod rotation triggers a ~1 s rebalance. 40-pod fleet redeploys in 8 min with zero downtime.
WebSocket gateway: rolling with 10 s connection drain. Clients reconnect automatically.
Flink: savepoint-and-restart. 60-90 s window where timers don't fire; recovered timers fire on startup.
Postgres schema changes: ALTER on live tables uses pg_repack or online-DDL techniques. Adding columns is free; changing types requires a shadow table migration.

[18.2] Metrics and alerts

Key metrics:

Metric	Alert threshold	Reason
Bid acceptance p99 latency	>200 ms for 5 min	User-visible slowness
Kafka `bids.incoming` lag	>30 s for 60 s	Bid backlog growing
Valkey CAS p99	>2 ms for 2 min	Hot-key saturation
Settlement p99	>5 s for 5 min	Revenue-critical path
Payment success rate	<99% for 5 min	Payment provider issue
WebSocket frame drop rate	>1% for 2 min	Gateway overload
Postgres write p95	>20 ms for 2 min	Approaching bottleneck

[18.3] Backup and recovery

Postgres continuous archiving to S3 with 5-min PITR granularity. Daily full backup.
Valkey AOF with everysec fsync. Snapshot to S3 every hour.
Kafka replication factor 3; no separate backup (the topic retention is the backup).
ClickHouse daily backup to S3; analytics replayable from Kafka retention.

[18.4] Capacity planning

Monitor ratio of peak to average bid rate weekly. If the 30× ratio grows, partitions need scaling.
Monitor hot-auction distribution. If the top 0.1% regularly exceeds 500 bids/sec, consider dedicated partition assignment.
Monitor Postgres bid table growth. Re-evaluate archive cadence quarterly.

[18.5] Top 5 alerts (3 AM on-call)

Bid acceptance p99 >500 ms. Likely Valkey or Postgres degradation.
Settlement latency >30 s. Payment provider or Flink issue.
Payment success rate <95%. Upstream provider outage or fraud spike.
WebSocket reconnection rate >10× baseline. Gateway pod crashes or LB misrouting.
Kafka lag >2 min. Processor fleet under-provisioned or stuck.

[18.6] Observability stack

Metrics: Prometheus scrapes all services; long-term retention in Mimir or Thanos. Grafana for dashboards. Red-golden-signal boards per subsystem (API, bid processor, Valkey, Postgres, Kafka, Flink, WebSocket gateway, settlement).
Tracing: OpenTelemetry SDK in each service; traces exported to Tempo or Jaeger. Trace header threads from API gateway through Kafka (producer-injected headers) into bid processor, broadcast, and settlement. The bid_id is tagged on every span so a single bid is end-to-end queryable.
Logging: structured JSON to Loki; ten-minute hot retention, 30-day cold. Alert on error-rate anomalies, not absolute error counts.
Profiling: continuous pprof for Go services (bid processor, WebSocket gateway) into Pyroscope. CPU flamegraphs are the fastest way to diagnose hot-key Valkey script regressions.
Synthetic probes: a black-box tester places bids against a canary auction every 30 s from each region; SLO breach fires before real users notice.

[18.7] Lua script change management

The CAS script is a hot-path correctness change; treat it like a schema migration:

EVALSHA versioning. Processors load script_v<N>.sha from config at startup. A deploy that flips the config to script_v<N+1>.sha is an atomic version swap.
Canary auction. New scripts are shadow-run against a synthetic auction in staging, then enabled for 1% of live auctions (steered via a Valkey config flag) before global rollout.
Instant rollback. Rollback is a config flip back to the previous SHA; old scripts are never deleted from Valkey until two deploy cycles have passed.

19. SLOs and Error Budgets

SLO	Target	Error budget
Bid acceptance availability	99.99%	52 min/year
Bid confirmation latency p99	<200 ms	7.2 h/month outside bound
Bid broadcast latency p99	<200 ms	7.2 h/month outside bound
Settlement correctness (zero double-settlements)	100%	0 incidents/year
Settlement latency p99	<5 s	43 h/month
WebSocket connection success	99.9%	43 min/month
Search freshness (new auction visible in search)	<60 s	10% of auctions/day

Error budgets drive release cadence. If bid acceptance availability dips below 99.99% monthly, feature deploys halt and engineering focuses on stability until the budget recovers.

20. Security

Authentication: OAuth 2.0 for the API; session cookies for the web client. JWTs carry user_id, account_status, region.
Authorization: bidders cannot bid on their own auctions (enforced at the API). Sellers cannot modify auctions after first bid (enforced by status check).
Payment data: never touches platform infrastructure. Payment provider handles PAN; the platform stores only a tokenized reference.
Webhook signatures: all incoming provider webhooks (Stripe, Adyen) verified via HMAC before processing.
Rate limits: per-user, per-IP, per-auction. Rate limit events logged for fraud analysis.
PII: bid history visible to the bidder, seller, and platform ops. Watcher lists are private. High-bidder usernames are masked in public views, so competitors cannot directly identify each other from the live feed.
Reserve price: stored plaintext in auctions.reserve_price under Postgres row-level security. The value never appears in API responses, WebSocket frames, Kafka payloads (bids.accepted, auctions.sold), or analytics exports. Buyers see a boolean reserve_met only, and only after settlement. Seller, settlement worker, and platform ops are the only readers.
Admin actions audited: all admin overrides (bid cancellation, auction force-close, user ban) logged to audit_log with actor, timestamp, reason.
Edge protection: WAF in front of the API with rules for SQL injection, path traversal, and known bad bot UAs. Anycast scrubbing (Cloudflare, Shield Advanced, or equivalent) absorbs volumetric DDoS.
Proof-of-work challenges: the bid endpoint optionally requires a lightweight PoW token on suspicious sessions (new account, high bid amount, residential-proxy IP block). Cost is ~100 ms client-side, invisible to real users, expensive for scrapers at scale.
CAPTCHA: adaptive challenge on account creation, password reset, and bid submission when the risk score crosses a threshold. Managed service (Turnstile, hCaptcha) rather than hand-rolled.

21. Key Takeaways

Optimistic concurrency via Valkey CAS scales bid acceptance to 50K/sec with sub-ms latency. Pessimistic row locks cannot.
Effectively-once settlement stacks three layers: fencing token, conditional UPDATE, and provider idempotency key. Any one alone is insufficient.
Kafka partition per-auction_id gives free per-auction serialization. Partition count is the hard parallelism ceiling; provision 2× growth up front.
Anti-sniping belongs inside the CAS, not a separate pipeline. Atomic extension is the only way to close the "accept bid" vs "extend end_time" race.
The Postgres-only variant (§5.2) is the right starting point up to ~500 bids/sec. Graduate to Kafka + Valkey + Flink only when scale demands it.

22. Appendix

A. Atomic CAS + anti-sniping Lua (with bid_id dedup)

lua

-- KEYS[1] = auction:{id}
-- KEYS[2] = bid_result:{bid_id}
-- ARGV:
--   1 bid_amount
--   2 expected_price
--   3 bidder_id
--   4 min_increment
--   5 now (unix seconds)
--   6 anti_snipe_seconds
--   7 anti_snipe_extend
--   8 result_ttl_seconds   (auction_end - now + 48h)

-- 1. Redelivery dedup. If bid_id has a cached outcome, return it verbatim.
local cached = redis.call('GET', KEYS[2])
if cached then
  return cjson.decode(cached)
end

local price  = tonumber(redis.call('HGET', KEYS[1], 'current_price'))
local endt   = tonumber(redis.call('HGET', KEYS[1], 'current_end_time'))
local status = redis.call('HGET', KEYS[1], 'status')

local function finish(result)
  redis.call('SET', KEYS[2], cjson.encode(result), 'EX', tonumber(ARGV[8]))
  return result
end

-- 2. Auction state checks (order matters; stale before too-low for UX).
if status ~= 'ACTIVE' or tonumber(ARGV[5]) > endt then
  return finish({0, 'AUCTION_CLOSED', price, endt})
end
if tonumber(ARGV[2]) ~= price then
  return finish({0, 'STALE_EXPECTED_PRICE', price, endt})
end
if tonumber(ARGV[1]) < price + tonumber(ARGV[4]) then
  return finish({0, 'BID_TOO_LOW', price, endt})
end

-- 3. Acceptance path.
local seq = redis.call('HINCRBY', KEYS[1], 'sequence_num', 1)
redis.call('HSET', KEYS[1],
  'current_price', ARGV[1],
  'high_bidder',   ARGV[3])

local time_left = endt - tonumber(ARGV[5])
if time_left < tonumber(ARGV[6]) then
  local new_end = endt + tonumber(ARGV[7])
  redis.call('HSET', KEYS[1], 'current_end_time', new_end)
  return finish({1, seq, ARGV[1], new_end, 'EXTENDED'})
end

return finish({1, seq, ARGV[1], endt, 'OK'})

B. Fencing token flow sequence

C. Bid sequence invariants

For a given auction:

sequence_num is monotonically increasing on accepted bids. Assigned atomically by the Valkey CAS script.
Gap-free within accepted bids (partial unique index on (auction_id, sequence_num) WHERE status = 'ACCEPTED'). Rejected bids carry sequence_num = NULL and do not consume a number.
Highest amount among status = 'ACCEPTED' is the current winner. Ties broken by lowest sequence_num (earliest arrival).
auctions.current_price is maintained by the bid processor's Postgres write; it is the displayed price. The authoritative ordering, however, lives in bids: highest accepted amount with lowest sequence_num tiebreak. For any read where exactness matters (e.g. settlement), derive from bids directly.

Explore the Technologies

Technology	Role	Learn more
Postgres 17	Source of truth for auctions, bids, settlements	PostgreSQL
Valkey 8	Per-auction state, CAS target, Pub/Sub bus	Redis/Valkey
Kafka 4.0	Bid delivery bus, per-auction ordering	Kafka
Apache Pulsar	Alternative dispatch bus (§5.3)	Pulsar
Flink 1.19	Auction-end timers, settlement pipeline	Flink
etcd 3.6	Leader lease (upgrade from advisory lock)	etcd
ClickHouse 24	Analytics over bid history	ClickHouse
Elasticsearch 8	Auction browse and search	Elasticsearch

Patterns: Message Queues & Event Streaming, Circuit Breakers & Resilience, Auto-scaling, Replication & Consistency, Rate Limiting.

Practice this design: Online Auction interview question.

CrackingWalnuts

System Design: Job Scheduler (10M Jobs/day, DAG Dependencies, Effectively-Once Execution)

April 15, 2026 · 61 min read

System Design: LeetCode (Code Sandbox, Container Isolation, Real-Time Contests)

April 12, 2026 · 68 min read

System Design: URL Shortener (10B Short URLs, 100K Redirects/sec)

April 11, 2026 · 42 min read

Continue Learning

Explore 30+ topics in System Design Interview Prep→

Deep dives, diagrams, and interview-ready knowledge.

System Design: Online Auction (50K Bids/sec, Effectively-Once Settlement, Anti-Sniping)

Architecture at a glance

1. Problem Statement

2. Functional Requirements

3. Non-Functional Requirements

[3.1] Traffic and workload assumptions

4. End-to-End Architecture

[4.1] Submit (write path)

[4.2] Process (bid processor fleet)

[4.3] Broadcast (WebSocket fan-out)

[4.4] Settle (auction end)

[4.5] Trace a bid

[4.6] Correctness guarantees

[4.7] Retraction and cancellation (cross-cutting)

[4.8] What is a "bid"?

[4.9] What this design intentionally avoids

[4.10] Store roles

5. Technology Selection

[5.1] What shape is this system?

[5.2] The simpler version (don't skip this)

[5.3] Store selection

[5.4] Build vs buy

6. Back-of-the-Envelope

[6.1] Throughput

[6.2] Bid processor sizing

[6.3] Postgres storage

[6.4] Valkey memory

[6.5] Kafka

[6.6] WebSocket sizing

[6.7] Growth projections

7. Data Model

[7.1] auctions (source of truth)

[7.2] bids (one row per bid, partitioned)

[7.3] proxy_bids

[7.4] auction_settlements

[7.5] Valkey key patterns

[7.6] Entity-relationship diagram

[7.7] Auction lifecycle

8. API Design

[8.1] Place a bid

[8.2] Set a proxy bid

[8.3] Create auction

[8.4] WebSocket protocol

[8.5] Ops endpoints

[8.6] Image and media pipeline

9. Settlement, Payouts, and Risk

[9.1] Core idea

[9.2] Real-world duplicate scenario

[9.3] Why this isn't exactly-once

[9.4] Settlement flow (order matters)

[9.5] Why the CAS alone isn't enough

[9.6] External idempotency (payment provider)

[9.7] Retraction mid-settlement

[9.8] Where duplicates are handled

[9.9] Settlement is not atomic

[9.10] Guarantees and non-guarantees

[9.11] Bidder risk tiers and payment holds

[9.12] Account deletion mid-auction

[9.13] Seller payout flow

10. Bid Processing Model

[10.1] Why optimistic concurrency, not pessimistic

[10.2] The CAS script, step by step

[10.3] Proxy bid resolution

[10.4] Dutch auction

[10.5] Sealed-bid variants

11. Timers and Anti-Sniping

[11.1] Timer service

[11.2] Anti-sniping extension semantics

[11.3] Timer re-arming

[11.4] Clock skew

[11.5] Testing the concurrency model

12. Hot Auctions and Fair Queueing

[12.1] The celebrity auction problem

[12.2] Per-user rate limit

[12.3] Noisy-neighbor protection layers

13. Auction Search and Ranking

[13.1] What buyers actually query

[13.2] Ranking signals

[13.3] Ending-soon ranking

[13.4] Free-text relevance

[7.1] `auctions` (source of truth)

[7.2] `bids` (one row per bid, partitioned)

[7.3] `proxy_bids`

[7.4] `auction_settlements`