System Design: E-Commerce Flash Sales (10M Users, Coupon System, One-Per-User Enforcement)
Goal: Design a flash sale platform that handles 10 million concurrent users competing for 100K discounted items and 500K limited coupons, with zero overselling, one-coupon-per-user enforcement, and graceful degradation under extreme load.
Scale: 10M concurrent users, 100K inventory items, 500K coupon pool, 50K orders/min at peak, sub-200ms checkout latency.
Mental model -- four ideas that make everything else click:
- Inventory = atomic counter. Decrement succeeds or fails in a single Valkey Lua eval. No read-then-write race.
- Coupon pool = pre-loaded list. LPOP is atomic. No two users get the same code.
- Queue = admission controller. 10M users enter, 1000 per batch reach backend.
- Checkout = saga. Reserve, apply, charge, confirm. Any failure triggers reverse.
TL;DR: Valkey (Redis-compatible) provides the atomic primitives for inventory decrement, coupon pool claims, and user deduplication. PostgreSQL serves as the durable ledger. Kafka decouples order processing. A CDN-served virtual queue shapes traffic before it reaches backend services. Temporal orchestrates the checkout saga with compensating transactions for coupon rollback on payment failure.
System invariant: The queue shapes traffic. Valkey provides atomics. PostgreSQL provides durability. Kafka decouples processing. Temporal orchestrates compensation. Each can fail independently without cascading into total outage.
The Three Problems
Every design decision in this system comes from just three constraints.
Inventory overselling. 50,000 users click "Buy Now" on the same 500-unit item within 200 milliseconds of T-0. A naive SELECT quantity ... UPDATE quantity = quantity - 1 will oversell: two threads read quantity=1, both decrement, and the system ships inventory it does not have.
Coupon double-claiming. 500,000 coupons, one per user. A motivated user opens five tabs or scripts a bot. Without atomic uniqueness enforcement, a single user claims dozens of coupons before the system processes the first claim.
Traffic spike at sale start. 10 million users hit the origin at T-0. Database connection pools exhaust. Valkey pipelines back up. The checkout service returns 503s. Users rage-refresh, amplifying the problem. Without traffic shaping, the system DDoSes itself.
Scale Numbers
| Metric | Value |
|---|---|
| Concurrent users at T-0 | 10,000,000 |
| Total inventory items (across all SKUs) | 100,000 |
| Coupon pool size | 500,000 |
| Peak orders per minute | 50,000 |
| Peak checkout requests per second | ~5,000 |
| Sale duration | 2 hours |
| Expected total orders | ~200,000 |
| Unique SKUs on sale | 500 |
| Average units per SKU | 200 |
| Coupon types | 4 (percent-off, fixed, BOGO, free shipping) |
Requirements
Functional Requirements
| # | Requirement | Description |
|---|---|---|
| FR-1 | Flash Sale Creation | Admin creates a sale with start time, end time, list of SKUs, and inventory per SKU |
| FR-2 | Inventory Management | System tracks available inventory per SKU with atomic decrement on purchase |
| FR-3 | Sale Countdown | Users see a real-time countdown to sale start; page activates at T-0 |
| FR-4 | Virtual Queue | Users entering at T-0 are placed in a virtual queue with estimated wait time |
| FR-5 | Add to Cart | User selects a sale item and adds it to their flash-sale cart (temporary hold) |
| FR-6 | Coupon Creation | Admin creates coupon campaigns with type (percent-off, fixed, BOGO, free shipping), pool size, and rules |
| FR-7 | Coupon Claiming | User claims a coupon from an available pool; claimed coupons are deducted atomically |
| FR-8 | One-Per-User Enforcement | Each user can claim at most one coupon per campaign, enforced across all devices and sessions |
| FR-9 | Coupon Application | User applies a claimed coupon at checkout; system validates eligibility and calculates discount |
| FR-10 | Coupon Stacking Rules | System enforces which coupon types can combine, priority order, and maximum discount caps |
| FR-11 | Checkout | User completes purchase: inventory reserved, coupon applied, payment processed |
| FR-12 | Payment Processing | Async payment via saga pattern; inventory and coupon released on failure |
| FR-13 | Order Tracking | User sees order status (pending, confirmed, shipped, delivered) |
| FR-14 | Sale Analytics | Real-time dashboard showing inventory levels, orders/sec, coupon redemption rates |
| FR-15 | Notification | User receives confirmation via email/push when order is confirmed or coupon is about to expire |
| FR-16 | Idempotent Checkout | Duplicate checkout requests (retries, double-clicks, network retries) must not create duplicate orders |
| FR-17 | Reservation Timeout | If payment is not completed within 10 minutes of inventory reservation, the reservation is released automatically |
Non-Functional Requirements
| Requirement | SLO / Target | Rationale |
|---|---|---|
| Throughput | 50,000 orders/min at peak | Derived from 10M users, ~2% conversion, concentrated in first 15 minutes |
| Checkout Latency (p99) | < 200ms | Users abandon after 3 seconds; checkout must be fast to prevent retries |
| Inventory Accuracy | Zero overselling | Legal and financial liability; refunds cost 5x the item margin |
| Coupon Uniqueness | Zero double-claims per user | One coupon per user per campaign, no exceptions |
| Availability | 99.95% during sale window | 2-hour sale; even 0.05% downtime = 3.6 seconds of lost orders |
| Queue Fairness | FIFO within 1-second cohorts | Users arriving at the same second get randomized within that cohort |
| Data Durability | Zero order loss | Every confirmed order must be durable within 100ms of confirmation |
| Coupon Rollback Latency | < 5 seconds | Failed payment must return coupon to pool quickly so others can claim |
| CDN Cache Hit Rate | > 99% for sale page | Origin must not serve the static sale page to 10M users |
| Horizontal Scalability | Linear up to 20M users | Architecture must scale by adding nodes, not by vertical scaling |
| Idempotency | All write operations idempotent under retries | Network failures and user retries are guaranteed during flash sale chaos |
| Graceful Degradation | System sheds load when downstream services degrade | Payment gateway slowdown must not cascade into inventory or coupon failures |
| Consistency Model | Strong consistency for inventory and coupons; eventual consistency for analytics | Overselling and double-claiming are correctness violations; reporting lag is acceptable |
Scale Estimation
These requirements define what the system must do. The scale numbers below reveal how hard each requirement becomes at 10M concurrent users.
Traffic Estimates
| Metric | Calculation | Result |
|---|---|---|
| Users at T-0 | Given | 10,000,000 |
| Page views in first minute | 10M users x 3 refreshes | 30,000,000 |
| CDN requests/sec (first minute) | 30M / 60 | 500,000 req/sec |
| Origin requests/sec (after queue) | 5,000 admitted/sec | 5,000 req/sec |
| Checkout requests/sec (peak) | 50K orders/min / 60 | ~833 req/sec |
| Valkey ops/sec (inventory) | 833 checkouts x 3 ops each | ~2,500 ops/sec |
| Valkey ops/sec (coupons) | 833 checkouts x 4 ops each | ~3,300 ops/sec |
| Valkey ops/sec (queue) | 5,000 admits + 10,000 status polls | ~15,000 ops/sec |
| Total Valkey ops/sec | Inventory + coupons + queue + cache | ~25,000 ops/sec |
Storage Estimates
| Data | Calculation | Size |
|---|---|---|
| Inventory records (Valkey) | 500 SKUs x 100 bytes | 50 KB |
| Coupon pool (Valkey list) | 500K coupons x 64 bytes | 32 MB |
| User claim records (Valkey set) | 500K claims x 80 bytes | 40 MB |
| Queue tokens (Valkey sorted set) | 10M tokens x 100 bytes | 1 GB |
| Orders (PostgreSQL) | 200K orders x 2 KB | 400 MB |
| Coupon claims (PostgreSQL) | 500K claims x 200 bytes | 100 MB |
| Kafka events | 200K orders x 5 events x 1 KB | 1 GB |
Valkey Cluster Sizing
| Requirement | Sizing |
|---|---|
| Total memory needed | ~1.5 GB (with overhead) |
| Ops/sec needed | ~25,000 |
| Single Valkey node capacity | ~200,000 ops/sec, 64 GB RAM |
| Minimum nodes for HA | 3 primaries + 3 replicas |
| Recommended | 3 primaries + 3 replicas (massive headroom) |
The bottleneck is not Valkey throughput or memory. It is hot-key contention on popular SKUs. A single SKU key will be accessed from all Valkey clients. The inventory sharding strategy in the Atomic Inventory Control section addresses this.
PostgreSQL Sizing
| Metric | Value |
|---|---|
| Peak write rate | ~833 orders/sec + ~833 coupon claims/sec |
| Connection pool size | 200 connections (pgbouncer) |
| Write latency (p99) | < 10ms |
| Required IOPS | ~5,000 |
| Instance type | db.r6g.2xlarge (8 vCPU, 64 GB) |
Why Naive Approaches Fail
The scale estimates above expose why textbook solutions collapse under flash sale load. Several natural approaches fail under these constraints.
Database row locks for inventory. PostgreSQL row locks under 50K concurrent writes per second cause lock contention, connection pool exhaustion, and cascading timeouts. A single row lock becomes a serialization bottleneck -- every checkout queues behind the previous one.
Application-level coupon uniqueness checks. In-memory sets without atomic operations will race. Two requests checking if user not in claimed_set simultaneously will both pass. By the time the second request writes to the set, the first has already claimed a code. The user ends up with two coupons.
Serving the sale page from origin. 10M users hitting Next.js SSR at T-0 will overwhelm compute. The sale landing page must be CDN-cached and static. Even the waiting room page must come from CDN -- if the queue page itself requires origin, the problem has already cascaded.
Eventual consistency for inventory counts. "Reconcile later" means overselling now. Inventory decrement must be atomic and strongly consistent at the moment of reservation, not eventually consistent after a replication lag.
Skipping the virtual queue. Letting all 10M users through to the checkout service simultaneously turns a controlled sale into a DDoS on internal infrastructure. Without admission control, the system fails under its own traffic.
Synchronous payment processing. Payment gateways have variable latency (200ms to 5s). Blocking on payment while holding inventory locks wastes capacity and creates cascading timeouts across the entire checkout path.
Ignoring coupon rollback. If payment fails after a coupon is claimed, that coupon must be returned to the pool. Otherwise coupons leak -- users see "sold out" for coupons that were never actually used.
Treating all SKUs equally. A doorbuster item with 50 units will have 100x the contention of a standard sale item with 5,000 units. Hot-key mitigation is essential for the most popular SKUs.
Allowing unconstrained coupon stacking. Without server-side stacking rules, users will combine a 50% off coupon with a BOGO deal and a free shipping code, buying a $200 item for $0. Stacking rules must be enforced atomically at checkout.
Architecture Overview
These failure modes shape every design decision. The architecture must absorb 500K CDN requests/sec at T-0, funnel 10M users through the virtual queue, and sustain 833 checkout requests/sec while maintaining exactly-once inventory semantics.
The architecture separates the hot path (Valkey atomics during the sale) from the warm path (async persistence to PostgreSQL via Kafka) and the cold path (post-sale reconciliation).
Write Path (Checkout Flow)
The write path has two critical nodes. The Virtual Queue is the admission controller -- it prevents 10M users from hitting backend services simultaneously, releasing 1,000 users every 2 seconds. The Temporal Workflow is the saga orchestrator -- it executes reserve-inventory, apply-coupon, charge-payment, confirm-order in sequence, and runs compensating transactions in reverse on failure.
Checkout Saga Path
Each step has a compensating action. If payment fails at step 3, the coupon is returned to the pool and inventory is released. No resource is permanently consumed by a failed checkout.
Read Path (Status and Browsing)
The read path is entirely separate from the write path. Sale item browsing reads from Valkey cache. Order status reads from PostgreSQL. Queue position reads from the Valkey sorted set. No read operation touches the checkout saga or its state.
The system has three independent paths. The write path executes checkouts. The read path serves browsing and status queries. The control path (virtual queue) regulates how many users reach the write path at any given moment. These paths share storage but never share execution.
If only one thing stays in memory from this article: the queue controls how many users enter the system. Valkey handles all critical atomic operations. PostgreSQL is the durable system of record. Kafka decouples writes from persistence. Temporal ensures failures do not leak state. Everything else builds on these five guarantees.
API Design
With the high-level architecture established, the API contract defines every interaction between clients and the system.
Versioning Strategy
All endpoints use URL-path versioning under /api/v1/. This makes the version explicit in every request and simplifies routing at the API Gateway layer. The deprecation policy enforces a 6-month sunset window: when /api/v2/ of an endpoint ships, /api/v1/ continues to function for 6 months with a Sunset response header indicating the retirement date. For experimental endpoints (beta features like VIP queue tiers), the Accept-Version request header selects the experimental variant without polluting the URL namespace.
Sale Endpoints
List Active Sales
GET /api/v1/sales?status=active
Response 200:
{
"sales": [
{
"id": "sale_abc123",
"name": "Summer Flash Sale 2026",
"start_time": "2026-07-01T12:00:00Z",
"end_time": "2026-07-01T14:00:00Z",
"status": "active",
"items_count": 500,
"queue_enabled": true
}
]
}
Get Sale Items
GET /api/v1/sales/{sale_id}/items?cursor=eyJza3UiOiJTS1UtMDIwIn0&limit=20
Response 200:
{
"items": [
{
"sku_id": "SKU-LAPTOP-001",
"product_name": "UltraBook Pro 14",
"original_price": 1299.99,
"sale_price": 649.99,
"available": true,
"remaining_pct": "low"
}
],
"pagination": {
"cursor": "eyJza3UiOiJTS1UtMDQwIn0",
"limit": 20,
"has_more": true
}
}
The system returns
remaining_pctas a bucket ("high", "medium", "low") rather than an exact count. Showing exact counts creates herding behavior where users rush the items with lowest remaining inventory.
Pagination uses cursor-based traversal. Offset-based pagination breaks under concurrent inserts and deletions during a live sale -- items shift between pages. The cursor encodes the last seen
sku_id, making each page request stable regardless of concurrent modifications.
Cart Endpoints
Add to Cart
POST /api/v1/cart
Authorization: Bearer {token}
Idempotency-Key: {uuid}
{
"sale_id": "sale_abc123",
"sku_id": "SKU-LAPTOP-001",
"quantity": 1
}
Response 201:
{
"cart_id": "cart_xyz789",
"sku_id": "SKU-LAPTOP-001",
"quantity": 1,
"sale_price": 649.99,
"hold_expires_at": "2026-07-01T12:10:00Z",
"message": "Item held for 10 minutes"
}
Response 409:
{
"error": "SOLD_OUT",
"message": "This item is no longer available"
}
Coupon Endpoints
Claim Coupon
POST /api/v1/coupons/claim
Authorization: Bearer {token}
Idempotency-Key: {uuid}
{
"campaign_id": "camp_summer50"
}
Response 200:
{
"coupon_code": "SUMMER-A7K2M",
"campaign_id": "camp_summer50",
"type": "percent_off",
"discount_value": 15,
"max_discount_cap": 100.00,
"min_cart_value": 50.00,
"valid_until": "2026-07-01T14:00:00Z"
}
Response 409:
{
"error": "ALREADY_CLAIMED",
"message": "You have already claimed a coupon from this campaign"
}
Response 410:
{
"error": "POOL_EXHAUSTED",
"message": "All coupons from this campaign have been claimed"
}
Apply Coupon to Cart
POST /api/v1/cart/apply-coupon
Authorization: Bearer {token}
Idempotency-Key: {uuid}
{
"cart_id": "cart_xyz789",
"coupon_code": "SUMMER-A7K2M"
}
Response 200:
{
"cart_id": "cart_xyz789",
"subtotal": 649.99,
"coupon_code": "SUMMER-A7K2M",
"coupon_type": "percent_off",
"discount_amount": 97.50,
"shipping_cost": 0.00,
"total": 552.49,
"applied_coupons": [
{
"code": "SUMMER-A7K2M",
"type": "percent_off",
"discount": 97.50,
"description": "15% off (max $100)"
}
]
}
Response 400:
{
"error": "MIN_CART_NOT_MET",
"message": "Cart subtotal must be at least $50.00 to use this coupon"
}
Checkout Endpoints
Initiate Checkout
POST /api/v1/checkout
Authorization: Bearer {token}
Idempotency-Key: {uuid}
{
"cart_id": "cart_xyz789",
"payment_method": "credit_card",
"payment_token": "tok_visa_4242"
}
Response 202:
{
"order_id": "ord_def456",
"status": "pending",
"saga_workflow_id": "wf_checkout_abc",
"message": "Order is being processed"
}
Get Order Status
GET /api/v1/orders/{order_id}
Authorization: Bearer {token}
Response 200:
{
"order_id": "ord_def456",
"status": "confirmed",
"sku_id": "SKU-LAPTOP-001",
"quantity": 1,
"subtotal": 649.99,
"coupon_discount": 97.50,
"total": 552.49,
"payment_status": "captured",
"created_at": "2026-07-01T12:05:30Z",
"confirmed_at": "2026-07-01T12:05:32Z"
}
Queue Endpoints
Join Queue
POST /api/v1/queue/join
Authorization: Bearer {token}
Idempotency-Key: {uuid}
{
"sale_id": "sale_abc123"
}
Response 200:
{
"queue_token": "qt_abc123xyz",
"position": 45231,
"estimated_wait_seconds": 90,
"websocket_url": "wss://ws.example.com/queue/qt_abc123xyz"
}
Check Queue Status
GET /api/v1/queue/status?token=qt_abc123xyz
Response 200:
{
"queue_token": "qt_abc123xyz",
"status": "waiting",
"position": 12045,
"estimated_wait_seconds": 24,
"admitted_so_far": 33186
}
// When admitted:
{
"queue_token": "qt_abc123xyz",
"status": "admitted",
"access_token": "at_secure_xyz",
"expires_at": "2026-07-01T12:15:00Z",
"message": "You may now browse and purchase"
}
Error Contract
All error responses follow a structured JSON format with machine-readable codes. Client applications switch on error_code, not on message strings.
{
"error_code": "INVENTORY_EXHAUSTED",
"message": "SKU-LAPTOP-001 is sold out",
"details": {
"sku_id": "SKU-LAPTOP-001",
"sale_id": "sale_abc123"
},
"request_id": "req_abc123",
"timestamp": "2026-07-01T12:05:30.123Z"
}| Error Code | HTTP Status | Trigger |
|---|---|---|
INVENTORY_EXHAUSTED | 409 | Inventory decrement returns 0 available |
COUPON_ALREADY_CLAIMED | 409 | SET NX returns nil (user already claimed) |
SALE_NOT_ACTIVE | 403 | Sale status is not 'active' at request time |
QUEUE_POSITION_EXPIRED | 410 | Admitted user's access token has expired |
PAYMENT_DECLINED | 402 | Payment gateway returns a decline code |
Rate Limiting
Per-endpoint rate limits protect backend services from abuse and amplification attacks. Limits are enforced at the API Gateway and communicated via standard headers.
| Endpoint | Limit | Window |
|---|---|---|
POST /queue/join | 3 requests | 60 seconds per user |
GET /queue/status | 30 requests | 60 seconds per user |
POST /coupons/claim | 5 requests | 60 seconds per user |
POST /checkout | 5 requests | 60 seconds per user |
GET /sales/{id}/items | 60 requests | 60 seconds per user |
Every response includes rate limit headers:
X-RateLimit-Limit: 5
X-RateLimit-Remaining: 3
X-RateLimit-Reset: 1719835200
When a limit is exceeded, the system returns HTTP 429 with a Retry-After header indicating the number of seconds until the limit resets.
Authorization
JWT tokens carry scoped permissions. Each endpoint requires a specific scope, enforced at the API Gateway before the request reaches the backend service.
| Scope | Grants Access To |
|---|---|
sale:browse | List sales, get sale items, get queue status |
cart:manage | Add to cart, remove from cart, view cart |
coupon:claim | Claim coupon, apply coupon to cart |
checkout:submit | Initiate checkout, view order status |
admin:sale-manage | Create/update sales, manage coupon campaigns, view analytics |
Tokens issued at queue admission carry sale:browse + cart:manage + coupon:claim + checkout:submit. Admin tokens are issued via a separate OAuth flow with MFA.
Data Model
The API contract reveals what data the system must persist and query. The schema below supports every endpoint above.
PostgreSQL Schemas
Sales Table
CREATE TABLE flash_sales (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
start_time TIMESTAMPTZ NOT NULL,
end_time TIMESTAMPTZ NOT NULL,
status VARCHAR(20) DEFAULT 'scheduled'
CHECK (status IN ('scheduled', 'active', 'ended', 'cancelled')),
max_orders_per_user INT DEFAULT 1,
queue_enabled BOOLEAN DEFAULT true,
queue_batch_size INT DEFAULT 1000,
queue_interval_ms INT DEFAULT 2000,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
-- Filter by status for active sale lookups (most common query)
CREATE INDEX idx_flash_sales_status ON flash_sales(status);
-- Range scan for upcoming sales sorted by start time
CREATE INDEX idx_flash_sales_start_time ON flash_sales(start_time);Inventory Table
CREATE TABLE sale_inventory (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
sale_id UUID NOT NULL REFERENCES flash_sales(id),
sku_id VARCHAR(50) NOT NULL,
product_name VARCHAR(255) NOT NULL,
original_price DECIMAL(10,2) NOT NULL,
sale_price DECIMAL(10,2) NOT NULL,
total_quantity INT NOT NULL,
sold_quantity INT DEFAULT 0,
reserved_quantity INT DEFAULT 0,
status VARCHAR(20) DEFAULT 'available'
CHECK (status IN ('available', 'sold_out', 'disabled')),
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now(),
UNIQUE(sale_id, sku_id)
);
-- Partition pruning: all inventory queries filter by sale_id
CREATE INDEX idx_sale_inventory_sale_id ON sale_inventory(sale_id);
-- Lookup items by SKU and availability status for catalog browsing
CREATE INDEX idx_sale_inventory_sku_status ON sale_inventory(sku_id, status);Coupon Campaigns Table
CREATE TABLE coupon_campaigns (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
sale_id UUID REFERENCES flash_sales(id),
name VARCHAR(255) NOT NULL,
coupon_type VARCHAR(20) NOT NULL
CHECK (coupon_type IN ('percent_off', 'fixed_amount', 'bogo', 'free_shipping')),
discount_value DECIMAL(10,2),
max_discount_cap DECIMAL(10,2),
min_cart_value DECIMAL(10,2) DEFAULT 0,
pool_size INT NOT NULL,
claimed_count INT DEFAULT 0,
one_per_user BOOLEAN DEFAULT true,
stackable BOOLEAN DEFAULT false,
stacking_priority INT DEFAULT 0,
applies_to VARCHAR(20) DEFAULT 'all'
CHECK (applies_to IN ('all', 'specific_skus', 'category')),
applicable_skus TEXT[],
valid_from TIMESTAMPTZ NOT NULL,
valid_until TIMESTAMPTZ NOT NULL,
status VARCHAR(20) DEFAULT 'active'
CHECK (status IN ('active', 'exhausted', 'expired', 'disabled')),
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
-- Campaign lookup by sale for sale-specific coupon listings
CREATE INDEX idx_coupon_campaigns_sale ON coupon_campaigns(sale_id);
-- Filter active/exhausted campaigns for claim eligibility checks
CREATE INDEX idx_coupon_campaigns_status ON coupon_campaigns(status);
-- Filter by type for stacking rule evaluation
CREATE INDEX idx_coupon_campaigns_type ON coupon_campaigns(coupon_type);Coupon Codes Table
CREATE TABLE coupon_codes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
campaign_id UUID NOT NULL REFERENCES coupon_campaigns(id),
code VARCHAR(20) NOT NULL UNIQUE,
status VARCHAR(20) DEFAULT 'available'
CHECK (status IN ('available', 'claimed', 'redeemed', 'expired', 'returned')),
claimed_by UUID REFERENCES users(id),
claimed_at TIMESTAMPTZ,
redeemed_at TIMESTAMPTZ,
order_id UUID,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Retrieve available codes per campaign for pool management
CREATE INDEX idx_coupon_codes_campaign ON coupon_codes(campaign_id, status);
-- Fast lookup by code string during claim validation
CREATE INDEX idx_coupon_codes_code ON coupon_codes(code);
-- List all codes claimed by a user for account page
CREATE INDEX idx_coupon_codes_claimed_by ON coupon_codes(claimed_by);Coupon Claims Table (Dedup Ledger)
CREATE TABLE coupon_claims (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
campaign_id UUID NOT NULL REFERENCES coupon_campaigns(id),
coupon_code_id UUID NOT NULL REFERENCES coupon_codes(id),
code VARCHAR(20) NOT NULL,
status VARCHAR(20) DEFAULT 'claimed'
CHECK (status IN ('claimed', 'applied', 'redeemed', 'rolled_back')),
claimed_at TIMESTAMPTZ DEFAULT now(),
applied_at TIMESTAMPTZ,
rolled_back_at TIMESTAMPTZ,
UNIQUE(user_id, campaign_id) -- THE critical constraint
);
-- Lookup all claims by a user for dedup and account page
CREATE INDEX idx_coupon_claims_user ON coupon_claims(user_id);
-- Aggregate claims per campaign for pool size monitoring
CREATE INDEX idx_coupon_claims_campaign ON coupon_claims(campaign_id);
-- Filter by status for rollback processing and reconciliation
CREATE INDEX idx_coupon_claims_status ON coupon_claims(status);Orders Table
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
sale_id UUID NOT NULL REFERENCES flash_sales(id),
user_id UUID NOT NULL REFERENCES users(id),
sku_id VARCHAR(50) NOT NULL,
quantity INT NOT NULL DEFAULT 1,
unit_price DECIMAL(10,2) NOT NULL,
subtotal DECIMAL(10,2) NOT NULL,
coupon_code VARCHAR(20),
coupon_discount DECIMAL(10,2) DEFAULT 0,
shipping_cost DECIMAL(10,2) DEFAULT 0,
total_amount DECIMAL(10,2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending'
CHECK (status IN (
'pending', 'inventory_reserved', 'coupon_applied',
'payment_processing', 'payment_failed',
'confirmed', 'shipped', 'delivered', 'cancelled', 'refunded'
)),
payment_id VARCHAR(100),
payment_method VARCHAR(50),
saga_workflow_id VARCHAR(255),
failure_reason TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
) PARTITION BY LIST (sale_id);
-- Partition pruning: all order queries filter by sale_id
CREATE INDEX idx_orders_sale ON orders(sale_id);
-- User order history lookup
CREATE INDEX idx_orders_user ON orders(user_id);
-- Filter by status for reservation timeout scanner and dashboards
CREATE INDEX idx_orders_status ON orders(status);
-- Time-range queries for analytics and reconciliation
CREATE INDEX idx_orders_created ON orders(created_at);Valkey Key Patterns
| Key Pattern | Type | Purpose | TTL |
|---|---|---|---|
inv:{sale_id}:{sku_id} | String (integer) | Available inventory count | Sale duration + 1 hour |
inv:reserved:{sale_id}:{sku_id} | String (integer) | Reserved (pending payment) count | Sale duration + 1 hour |
coupon:pool:{campaign_id} | List | Pre-generated coupon codes (LPOP to claim) | Campaign validity + 1 hour |
coupon:claimed:{campaign_id} | Set | Set of user_ids who claimed from this campaign | Campaign validity + 1 day |
coupon:user:{user_id}:{campaign_id} | String | SET NX guard; value = coupon_code | Campaign validity + 1 day |
coupon:code:{code} | Hash | Coupon code details (campaign_id, type, value, status) | Campaign validity + 1 day |
queue:{sale_id} | Sorted Set | Queue tokens scored by join timestamp | Sale duration + 1 hour |
queue:admitted:{sale_id} | Set | Set of user_ids currently admitted | Sale duration + 1 hour |
queue:position:{sale_id} | String (integer) | Current admission cursor position | Sale duration + 1 hour |
cart:{user_id}:{sale_id} | Hash | Temporary cart: sku_id, quantity, hold_expiry | 10 minutes |
sale:meta:{sale_id} | Hash | Cached sale metadata (name, start, end, status) | Sale duration + 1 hour |
rate:{user_id}:checkout | String (counter) | Rate limit: max 5 checkout attempts per minute | 60 seconds |
Storage Tiering
Flash sale data has three distinct access patterns with different latency and durability requirements.
| Tier | Technology | Data | Access Pattern |
|---|---|---|---|
| Hot (during sale) | Valkey | inventory counts, coupon pool, queue positions, session state | Every request, sub-ms |
| Warm | PostgreSQL | orders, coupon claims, sale config, inventory ledger | Checkout writes, post-sale queries |
| Cold | S3 + Parquet | completed sale archives, audit logs, analytics exports | Post-sale analysis, compliance |
Hot tier data lives in Valkey for the sale duration plus a 1-hour buffer. After the reconciliation job confirms Valkey and PostgreSQL are consistent, hot tier keys are left to expire via TTL. Warm tier data remains in PostgreSQL for 2 years (financial compliance). Cold tier data is exported via a nightly batch job that converts completed sale partitions into Parquet files on S3, where Athena queries support post-hoc analysis at $5/TB scanned.
Schema Evolution
Flash sale tables must evolve without downtime. The migration strategy varies by table criticality.
General approach: Add nullable columns first, deploy code that writes to both old and new columns, backfill existing rows, then add NOT NULL constraints once all rows have values. This three-phase approach avoids ALTER TABLE ... ADD COLUMN NOT NULL which acquires an ACCESS EXCLUSIVE lock on the entire table.
coupon_claims table changes: This table receives the highest write contention during sales. Schema changes use a dual-write period: the old schema and new schema coexist, with the application writing to both. After one full sale cycle confirms compatibility, the old columns are dropped.
sale_inventory uses soft deletes: The status column supports 'disabled' as a value rather than using DELETE statements. This avoids ALTER TABLE or heavy DELETE operations on a hot table during an active sale. Disabled rows are cleaned up post-sale by the archival job.
Online DDL for large tables: pg_repack handles table restructuring without holding long locks. For coupon_codes (potentially millions of rows), pg_repack repacks the table and indexes online with minimal locking.
Partitioning Strategy
The data model defines what is stored. The partitioning strategy determines how data is distributed across nodes to prevent hot spots and enable horizontal scaling.
Valkey Inventory Sharding
Inventory keys use hash-slot sharding: hash(sku_id) mod 16 maps each SKU to one of 16 virtual slots. This isolates hot keys and spreads load across Valkey cluster shards. The Lua scripts for inventory decrement reference the shard keys directly (detailed in the Atomic Inventory Control deep dive).
Risk: A doorbuster SKU with 1 SKU concentrates all traffic on 1 virtual slot. The 16-slot fan-out mitigates this -- 16 shards each hold a fraction of the inventory, and the atomic decrement script scans shards sequentially. For standard SKUs with low contention, a single shard (no fan-out) is used, avoiding unnecessary multi-key coordination.
PostgreSQL Order Partitioning
The orders table is partitioned by sale_id using PARTITION BY LIST. Each flash sale is an independent event, and queries never cross sales. Partition pruning eliminates full-table scans -- a query for WHERE sale_id = 'sale_abc123' touches only the partition for that sale. After a sale concludes and reconciliation passes, the partition can be detached and archived to cold storage in a sub-second operation that holds an exclusive lock only briefly.
Kafka Partition Strategy
Order events are partitioned by order_id. All events for a single order (created, paid, confirmed) land on the same partition and are processed in order. Cross-order ordering is not required.
Partition count: 24. Math: peak throughput is 833 orders/sec x 3 events/order = 2,500 events/sec. Target throughput per partition is ~50 msg/sec for comfortable consumer processing. 2,500 / 50 = 50, but sustained throughput is lower than burst. Using the 5x burst figure: 12,500 msg/sec / 600 msg/sec/partition = 20.8, rounded to 24 for headroom.
Queue Sorted Set Partitioning
The queue uses a single Valkey key per sale: queue:{sale_id}. At 10M members, ZRANGEBYSCORE is O(log N + M) where M is the batch size (100 at a time) -- acceptable for batch admission. The admission controller reads 100 members per cycle, making M small relative to N.
If queue size exceeds 50M members (a 5x scale scenario), the queue is sharded by hash(user_id) mod 4 into four queue keys: queue:{sale_id}:0 through queue:{sale_id}:3. The admission controller round-robins across shards, admitting 25 users per shard per cycle to maintain FIFO fairness across shards.
Rebalancing
Valkey cluster rebalancing is online -- slot migration moves data between nodes without downtime. PostgreSQL partition attach/detach operations hold a sub-second exclusive lock. Kafka partition reassignment uses cruise-control for automated, throttled rebalancing that avoids consumer disruption during an active sale.
Caching Strategy
The partitioning strategy distributes data. The caching strategy determines which data is served from which layer and at what freshness, keeping the hot path off the database and the origin servers.
Cache Topology
| Layer | Technology | Data | TTL | Invalidation |
|---|---|---|---|---|
| CDN | CloudFront | sale landing page, product images, JS/CSS | 60s | Invalidation API at T-0 for fresh sale page |
| L1 (in-process) | Node.js Map | sale config, category list | 30s | Process restart or TTL |
| L2 (distributed) | Valkey | inventory counts (approximate), queue positions, sale metadata | sale duration + 1h | Event-driven on inventory change |
| Source of truth | PostgreSQL | all durable state | permanent | N/A |
Invalidation Model
Inventory counts in Valkey ARE the source of truth during the sale, not a cache. The Lua decrement script mutates Valkey directly; PostgreSQL follows asynchronously via Kafka. Queue positions update every 5s via WebSocket push. Sale metadata is cached with TTL = sale duration + 1 hour; changes to sale status (e.g., early termination) trigger explicit key deletion.
Thundering Herd at T-0
10M users hit /api/v1/sales/{id} simultaneously at sale start. CDN absorbs 99% of reads -- the static sale page is pre-deployed with a 60s TTL. Dynamic inventory counts are served from Valkey (sub-ms reads). The queue join endpoint (POST /api/v1/queue/join) is not cacheable -- the Valkey sorted set handles the write load directly at 200K+ ops/sec, well above the 10M requests spread across several seconds of clock skew.
Cache Warming (T-24h)
Pre-load sale metadata, product catalog, and coupon pools into Valkey 24 hours before the sale. Warm CDN by requesting all sale pages via synthetic traffic from multiple edge locations. Verify Valkey memory usage is below 70% threshold to leave headroom for queue tokens (1 GB at 10M users). If memory exceeds 70%, scale the Valkey cluster before the sale starts.
Eviction Policy
Valkey uses maxmemory-policy volatile-lru -- only keys with a TTL set are eligible for eviction. Inventory keys have no TTL during the sale because they are the source of truth, not a cache. Queue keys expire 1 hour after the sale ends. Session and rate-limit keys have short TTLs and are evicted first under memory pressure.
Cache-DB Consistency
During the sale, Valkey leads and PostgreSQL follows (async via Kafka). After the sale, the reconciliation job compares Valkey inventory counts with the PostgreSQL ledger. Any mismatch triggers an alert and manual review. This is the inverse of typical cache patterns -- Valkey is NOT a cache for inventory during the sale; it IS the hot-path store. PostgreSQL is the cold-path store that provides durability and post-sale querying.
Consistency Model
The caching strategy inverts the typical cache-DB relationship for inventory. This section makes the consistency guarantees explicit for every operation in the system.
Per-Operation Guarantee
| Operation | Guarantee | Mechanism |
|---|---|---|
| Inventory decrement | Linearizable | Valkey Lua script (single-threaded, atomic) |
| Coupon claim | Linearizable | Valkey LPOP + SET NX (atomic per shard) |
| Queue join | Linearizable | Valkey ZADD (atomic sorted set insert) |
| Order creation | Serializable | PostgreSQL transaction with UNIQUE constraints |
| Payment processing | At-most-once | Idempotency key prevents double-charge |
| Inventory sync to PG | Eventually consistent | Kafka consumer, ~5s lag |
| Analytics events | At-least-once | Kafka acks=all, idempotent consumer |
| Queue position read | Eventually consistent | WebSocket push every 5s |
CAP/PACELC Analysis
The system is CP for inventory operations -- it sacrifices availability rather than risk overselling. If a Valkey shard becomes unreachable, the admission controller pauses queue advancement for affected SKUs rather than falling back to a less-consistent store. For read operations (queue position, sale catalog), the system is AP -- it serves stale data from CDN/cache rather than returning errors.
Under the PACELC "else" clause (when there is no partition): the system chooses consistency over latency for writes. The Lua script adds ~1ms overhead versus a raw SET, but that 1ms buys atomic check-and-decrement. For reads, the system chooses latency over consistency -- CDN serves 60s stale sale pages, and queue positions are pushed every 5s rather than fetched on every render.
Saga Consistency
The checkout saga provides eventual consistency across services. Each step is independently atomic: inventory decrement is linearizable in Valkey, payment capture is at-most-once via idempotency key, and order creation is serializable in PostgreSQL. Compensating transactions restore invariants on failure. The Temporal workflow engine guarantees exactly-once execution of the saga steps, ensuring that even if a worker crashes mid-saga, the workflow resumes from the last completed step without repeating side effects.
Technology Selection
The consistency and caching requirements above constrain technology choices. Each component must meet specific guarantees under extreme concurrency.
Inventory Management Approach Comparison
| Approach | How It Works | Throughput | Overselling Risk | Complexity |
|---|---|---|---|---|
| PostgreSQL SELECT FOR UPDATE | Row lock on inventory row, decrement, commit | ~2,000 ops/sec per row | Low (if done correctly) | Low |
| PostgreSQL Advisory Lock | Application-level lock per SKU, then update | ~5,000 ops/sec | Low | Medium |
| Valkey Atomic DECR | DECR sku:inventory:123, check if >= 0 | ~200,000 ops/sec | Zero (atomic) | Low |
| Valkey Lua Script | Atomic check-and-decrement in single eval | ~150,000 ops/sec | Zero (atomic + conditional) | Medium |
| Queue-Based (Kafka) | Serialize all decrement requests through a partition | ~50,000 ops/sec per partition | Zero (serialized) | High |
Decision: Valkey Lua Script. The Lua script approach gives atomic check-and-decrement with conditional logic (check quantity > 0 before decrementing) at 150K+ ops/sec. PostgreSQL serves as the durable ledger; Valkey is the fast path.
Technology Stack
| Component | Technology | Why |
|---|---|---|
| Inventory Cache & Atomics | Valkey 8.x (Redis-compatible) | Atomic Lua scripts, 200K+ ops/sec, cluster mode |
| Coupon Pool | Valkey List + Set | LPOP for atomic claim, SET NX for user dedup |
| Primary Database | PostgreSQL 16 | ACID transactions, UNIQUE constraints, battle-tested |
| Event Streaming | Kafka | Decouple order processing, exactly-once semantics |
| Saga Orchestration | Temporal | Durable workflow execution, automatic retry, compensation |
| CDN | CloudFront / Cloudflare | Static sale page, waiting room, DDoS protection |
| Real-Time Updates | WebSocket (via Socket.io) | Queue position updates, inventory status |
| API Gateway | Kong | Sub-ms plugin overhead vs 10-30ms Lambda authorizer cold starts on AWS API Gateway. Native JWT validation, sliding-window rate limiting, and request transformation run as in-process Lua plugins -- no external hop. Handles 100K+ req/sec per node on commodity hardware. Deploys on Kubernetes alongside application pods, avoiding vendor lock-in and enabling per-route traffic policies that ALB cannot express. |
| Container Orchestration | Kubernetes (EKS) | Auto-scaling pods based on queue depth |
| Monitoring | Prometheus + Grafana | Real-time dashboards for sale metrics |
| Object Storage | S3 | Product images, sale banners |
| Search | Elasticsearch | Product search during sale (if needed) |
Queue and Stream Capacity Planning
The technology stack identifies Kafka as the event backbone. The capacity plan below ensures the Kafka cluster handles burst throughput without becoming a bottleneck during peak sale minutes.
Kafka Cluster Sizing
Write throughput: 833 orders/sec x 3 events/order (created, paid, confirmed) = 2,500 events/sec. Average event size is 0.5KB, yielding 1.25MB/sec sustained write throughput.
Peak (first 5 minutes of sale): Traffic concentrates in the opening burst. At 5x sustained rate: 12,500 events/sec = 6.25MB/sec.
Partition count: 24 partitions. Math: peak 12,500 msg/sec / 600 msg/sec/partition (comfortable consumer throughput) = 20.8, rounded to 24 for burst headroom.
Replication factor: 3 across 3 brokers. In-sync replica minimum: 2. This ensures no data loss if one broker fails.
Retention: 7 days. Storage: 1.25MB/sec x 86,400 sec/day x 7 days x 3 replicas = 2.3TB. The 7-day retention window allows replay for reconciliation and audit. Completed events older than 7 days are already persisted in PostgreSQL and archived to S3.
Consumer Groups
Three consumer groups process events independently:
- inventory-sync -- writes inventory changes to PostgreSQL. Highest priority.
- analytics -- feeds real-time dashboards and post-sale reports.
- notifications -- triggers email/push confirmations to buyers.
Consumer Lag Thresholds
| Lag Level | Threshold | Action |
|---|---|---|
| Normal | < 5K messages | No action |
| Warning | 5K messages (~2s lag) | Alert on-call, monitor trend |
| Critical | 50K messages (~20s lag) | Auto-scale consumers. If inventory-sync lag exceeds 10s, pause analytics and notification consumers to free broker resources |
Dead Letter Queue
The orders.dlq topic receives messages that fail 3 consecutive processing retries. Common failure causes: invalid order state transitions (e.g., confirming an already-cancelled order) and payment callback schema mismatches. The DLQ is reviewed within 1 hour during an active sale and daily otherwise. A Grafana alert fires when DLQ depth exceeds 10 messages.
Ordering Guarantees
Events are keyed by order_id. All events for a single order land on the same partition and are processed in order. Cross-order ordering is not required and would reduce parallelism.
Backpressure
Kafka absorbs burst traffic by design -- producers write faster than consumers read, and the log buffers the difference. If consumer lag exceeds 50K messages, the admission controller reduces queue advancement rate by 50%, admitting fewer users per batch. Fewer new checkouts means fewer new events, creating end-to-end backpressure from Kafka consumer lag through to user-facing queue wait time. This feedback loop prevents unbounded lag growth.
System Flows
The capacity plan ensures the event backbone handles peak load. The sequence diagrams below show how components interact end-to-end under normal operation and failure.
Queue Join and Admission
Checkout Happy Path
Checkout Failure with Compensation
Cache Hit/Miss Read Path
These flows show the system end-to-end under normal operation and failure. The deep dives that follow explain how each component achieves its guarantees.
Virtual Queue and Traffic Shaping
The schema and API define the system's surface area. Each deep dive below explains how one component enforces its invariants, starting with how 10M users enter the system without overwhelming it.
The first thing that happens at T-0 is 10M users arriving simultaneously. The queue is not about fairness. It is about protecting the backend. Without it, every other component fails under load -- inventory atomics, coupon claims, and payment processing all collapse under uncontrolled traffic.
Problem: 10M users hit the checkout endpoint simultaneously. Even with Valkey handling 200K ops/sec, the backend services (payment gateway, PostgreSQL writes, Kafka publishing) cannot absorb 10M concurrent requests.
Simple example: A nightclub with capacity for 500 people. 10,000 show up at opening. A bouncer at the door lets in groups of 50 every 2 minutes. People outside wait in a line with a "estimated wait: 20 minutes" sign.
Mental model: The queue is a flow control valve between the internet and the backend. It converts a traffic spike into a smooth, sustained load that the system can handle.
Queue Architecture
Queue Join Flow
When a user arrives at the sale page:
// queue-service.ts
import { Redis } from 'ioredis';
import { v4 as uuidv4 } from 'uuid';
const valkey = new Redis({ host: 'valkey-cluster', port: 6379 });
interface QueueJoinResult {
queueToken: string;
position: number;
estimatedWaitSeconds: number;
websocketUrl: string;
}
async function joinQueue(saleId: string, userId: string): Promise<QueueJoinResult> {
const queueKey = `queue:${saleId}`;
const now = Date.now();
const token = `qt_${uuidv4().replace(/-/g, '').substring(0, 16)}`;
// Add to sorted set with timestamp as score
// If user already in queue, this is idempotent (same score)
const member = `${userId}:${token}`;
await valkey.zadd(queueKey, now, member);
// Get position (0-indexed rank)
const rank = await valkey.zrank(queueKey, member);
const position = (rank ?? 0) + 1;
// Get current admission cursor
const cursor = parseInt(await valkey.get(`queue:position:${saleId}`) || '0');
const usersAhead = Math.max(0, position - cursor);
// Estimate wait time based on admission rate
const batchSize = 1000;
const batchIntervalSec = 2;
const estimatedWaitSeconds = Math.ceil(usersAhead / batchSize) * batchIntervalSec;
return {
queueToken: token,
position,
estimatedWaitSeconds,
websocketUrl: `wss://ws.flashsale.example.com/queue/${token}`
};
}Admission Controller
A background process runs every 2 seconds, admitting the next batch of users:
// admission-controller.ts
interface AdmissionConfig {
batchSize: number;
intervalMs: number;
maxConcurrentAdmitted: number;
}
async function runAdmissionLoop(saleId: string, config: AdmissionConfig): Promise<void> {
const queueKey = `queue:${saleId}`;
const admittedKey = `queue:admitted:${saleId}`;
const cursorKey = `queue:position:${saleId}`;
while (true) {
// Check if sale is still active
const saleStatus = await valkey.hget(`sale:meta:${saleId}`, 'status');
if (saleStatus !== 'active') break;
// Check current admitted count (users actively shopping)
const admittedCount = await valkey.scard(admittedKey);
// Only admit more if below the concurrent cap
if (admittedCount >= config.maxConcurrentAdmitted) {
await sleep(config.intervalMs);
continue;
}
const spotsAvailable = Math.min(
config.batchSize,
config.maxConcurrentAdmitted - admittedCount
);
// Get next batch from sorted set (by join time, FIFO)
const cursor = parseInt(await valkey.get(cursorKey) || '0');
const members = await valkey.zrange(queueKey, cursor, cursor + spotsAvailable - 1);
if (members.length === 0) {
await sleep(config.intervalMs);
continue;
}
// Admit users
const pipeline = valkey.pipeline();
for (const member of members) {
const userId = member.split(':')[0];
pipeline.sadd(admittedKey, userId);
}
pipeline.set(cursorKey, (cursor + members.length).toString());
await pipeline.exec();
// Notify admitted users via WebSocket
for (const member of members) {
const userId = member.split(':')[0];
const token = member.split(':')[1];
await notifyAdmission(userId, token, saleId);
}
console.log(`Admitted ${members.length} users. Total admitted: ${admittedCount + members.length}`);
await sleep(config.intervalMs);
}
}
function sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}Queue Position Updates via WebSocket
Users need to see their position and estimated wait time:
// queue-websocket.ts
import { WebSocket, WebSocketServer } from 'ws';
const wss = new WebSocketServer({ port: 8080, path: '/queue' });
const userSockets = new Map<string, WebSocket>();
wss.on('connection', (ws, req) => {
const token = req.url?.split('/queue/')[1];
if (token) {
userSockets.set(token, ws);
}
ws.on('close', () => {
if (token) userSockets.delete(token);
});
});
interface QueueUpdate {
type: 'position_update' | 'admitted' | 'sale_ended';
position?: number;
estimatedWaitSeconds?: number;
accessToken?: string;
expiresAt?: string;
}
async function notifyAdmission(userId: string, token: string, saleId: string): Promise<void> {
const ws = userSockets.get(token);
if (!ws || ws.readyState !== WebSocket.OPEN) return;
// Generate a short-lived access token for the admitted user
const accessToken = generateAccessToken(userId, saleId);
const update: QueueUpdate = {
type: 'admitted',
accessToken,
expiresAt: new Date(Date.now() + 15 * 60 * 1000).toISOString()
};
ws.send(JSON.stringify(update));
}
// Periodic position broadcast (every 5 seconds)
async function broadcastPositions(saleId: string): Promise<void> {
const cursorKey = `queue:position:${saleId}`;
const cursor = parseInt(await valkey.get(cursorKey) || '0');
const batchSize = 1000;
const batchIntervalSec = 2;
for (const [token, ws] of userSockets) {
if (ws.readyState !== WebSocket.OPEN) continue;
const userId = await getUserIdFromToken(token);
const rank = await valkey.zrank(`queue:${saleId}`, `${userId}:${token}`);
if (rank === null) continue;
const position = rank + 1;
const usersAhead = Math.max(0, position - cursor);
const estimatedWait = Math.ceil(usersAhead / batchSize) * batchIntervalSec;
const update: QueueUpdate = {
type: 'position_update',
position,
estimatedWaitSeconds: estimatedWait
};
ws.send(JSON.stringify(update));
}
}CDN-Based Waiting Room
The waiting room page itself must be served from CDN, not origin. If 10M users hit origin to load the queue page, the system has already failed.
Client --> CDN (cached waiting-room.html)
|
+--> Inline JS polls /api/v1/queue/status (origin)
| (rate limited to 1 request per 2 seconds per client)
|
+--> WebSocket to ws.flashsale.example.com
(for push-based updates)
Key design decision: The waiting room HTML/JS/CSS is deployed to CDN 24 hours before the sale. The countdown timer runs client-side. At T-0, the JS initiates the queue join request. This means 10M users load static assets from CDN, and only the queue join API call hits origin -- spread across a few seconds of clock skew.
Adaptive Admission Rate
The admission controller adjusts batch size based on backend health:
async function getAdaptiveBatchSize(saleId: string, baseBatchSize: number): Promise<number> {
// Check checkout service error rate
const errorRate = await getMetric('checkout_error_rate_5m');
// Check Valkey latency
const valkeyP99 = await getMetric('valkey_latency_p99_ms');
// Check PostgreSQL connection pool usage
const pgPoolUsage = await getMetric('pg_pool_usage_percent');
let multiplier = 1.0;
if (errorRate > 0.05) multiplier *= 0.5; // 5%+ errors: halve admission
if (valkeyP99 > 10) multiplier *= 0.7; // Valkey slow: reduce 30%
if (pgPoolUsage > 0.8) multiplier *= 0.6; // DB stressed: reduce 40%
const adaptiveBatch = Math.max(100, Math.floor(baseBatchSize * multiplier));
console.log(`Adaptive batch size: ${adaptiveBatch} (base: ${baseBatchSize}, multiplier: ${multiplier.toFixed(2)})`);
return adaptiveBatch;
}Atomic Inventory Control
With the queue controlling admission, the next problem is ensuring that two admitted users never purchase the same last unit. This is the core correctness requirement.
Problem: Multiple concurrent requests try to decrement the same inventory counter. An operation is needed that atomically checks "is quantity > 0?" and decrements in a single step, with no window for a race condition.
Simple example: Two cashiers at a store, one item left. Customer A asks "is it available?" -- yes. Customer B asks "is it available?" -- yes. Both ring it up. Two sales, one item. The fix: a single cashier with a locked register.
Mental model: Valkey executes Lua scripts atomically. No other command can interleave during script execution. This gives a compare-and-set primitive without distributed locks -- the equivalent of a single-threaded cashier.
Valkey Lua Atomic Decrement
-- inventory_decrement.lua
-- KEYS[1] = inv:{sale_id}:{sku_id}
-- KEYS[2] = inv:reserved:{sale_id}:{sku_id}
-- ARGV[1] = quantity requested
-- Returns: 1 if success, 0 if insufficient, -1 if key missing
local available = redis.call('GET', KEYS[1])
if available == false then
return -1 -- key does not exist, sale item not loaded
end
local avail = tonumber(available)
local requested = tonumber(ARGV[1])
if avail < requested then
return 0 -- insufficient inventory
end
-- Atomic decrement available, increment reserved
redis.call('DECRBY', KEYS[1], requested)
redis.call('INCRBY', KEYS[2], requested)
return 1 -- successCalling the Lua Script from the Inventory Service
// inventory-service.ts
import { Redis } from 'ioredis';
const valkey = new Redis({ host: 'valkey-cluster', port: 6379 });
// Load script once, use SHA for subsequent calls
const DECREMENT_SCRIPT = `
local available = redis.call('GET', KEYS[1])
if available == false then return -1 end
local avail = tonumber(available)
local requested = tonumber(ARGV[1])
if avail < requested then return 0 end
redis.call('DECRBY', KEYS[1], requested)
redis.call('INCRBY', KEYS[2], requested)
return 1
`;
let scriptSha: string;
async function loadScript(): Promise<void> {
scriptSha = await valkey.script('LOAD', DECREMENT_SCRIPT) as string;
}
async function reserveInventory(
saleId: string,
skuId: string,
quantity: number
): Promise<'success' | 'sold_out' | 'not_found'> {
const result = await valkey.evalsha(
scriptSha,
2,
`inv:${saleId}:${skuId}`,
`inv:reserved:${saleId}:${skuId}`,
quantity.toString()
);
switch (result) {
case 1: return 'success';
case 0: return 'sold_out';
case -1: return 'not_found';
default: return 'not_found';
}
}Inventory Release (Compensation)
When payment fails, reserved inventory must be returned:
-- inventory_release.lua
-- KEYS[1] = inv:{sale_id}:{sku_id}
-- KEYS[2] = inv:reserved:{sale_id}:{sku_id}
-- ARGV[1] = quantity to release
local reserved = tonumber(redis.call('GET', KEYS[2]) or '0')
local to_release = tonumber(ARGV[1])
if to_release > reserved then
to_release = reserved -- safety: never release more than reserved
end
redis.call('INCRBY', KEYS[1], to_release)
redis.call('DECRBY', KEYS[2], to_release)
return to_releaseApproach Comparison
| Aspect | Pessimistic (SELECT FOR UPDATE) | Optimistic (Version Column) | Valkey Lua (chosen) |
|---|---|---|---|
| Lock type | Row-level exclusive lock | No lock; retry on version mismatch | No lock; atomic script |
| Contention behavior | Requests queue behind lock | Requests retry (exponential backoff) | Requests complete immediately |
| Throughput at 1K concurrent | ~500 ops/sec | ~2,000 ops/sec (with retries) | ~150,000 ops/sec |
| Deadlock risk | Yes (multi-row operations) | No | No |
| Starvation risk | Yes (long-held locks) | Yes (repeated retries) | No |
| Durability | Immediate (committed to DB) | Immediate (committed to DB) | Eventual (async sync to DB) |
| Complexity | Low | Medium (retry logic) | Medium (Lua scripting) |
Tradeoff: Valkey Lua gives 75x the throughput of PostgreSQL row locks, but inventory state is in memory, not durable. The design accepts this: Valkey is the source of truth during the sale, and Kafka carries inventory events to PostgreSQL asynchronously. A post-sale reconciliation job catches any discrepancies.
Hot-Key Mitigation for Popular SKUs
A doorbuster item (e.g., $99 laptop) will have all traffic targeting a single Valkey key. Even though Valkey is single-threaded and fast, a single key on a single shard becomes a bottleneck in cluster mode.
Strategy: Inventory Sharding. Split a single SKU's inventory across multiple virtual slots:
-- inventory_decrement_sharded.lua
-- KEYS[1..N] = inv:{sale_id}:{sku_id}:shard:{0..N-1}
-- ARGV[1] = quantity requested
local requested = tonumber(ARGV[1])
local total_available = 0
-- First pass: sum available across shards
for i = 1, #KEYS do
local avail = tonumber(redis.call('GET', KEYS[i]) or '0')
total_available = total_available + avail
end
if total_available < requested then
return 0
end
-- Second pass: decrement from first shard with availability
local remaining = requested
for i = 1, #KEYS do
if remaining <= 0 then break end
local avail = tonumber(redis.call('GET', KEYS[i]) or '0')
if avail > 0 then
local take = math.min(avail, remaining)
redis.call('DECRBY', KEYS[i], take)
remaining = remaining - take
end
end
return 1The client hashes to a random shard on each attempt, distributing load:
function getInventoryShardKeys(saleId: string, skuId: string, shardCount: number): string[] {
return Array.from({ length: shardCount }, (_, i) =>
`inv:${saleId}:${skuId}:shard:${i}`
);
}
// For doorbuster items: 16 shards
// For standard items: 1 shard (no sharding needed)
const shardCount = isDoorBuster(skuId) ? 16 : 1;
const keys = getInventoryShardKeys(saleId, skuId, shardCount);Reservation Timeout (FR-17)
The saga compensates on payment failure, but what if the user closes their browser after inventory is reserved? The saga never completes, and inventory stays locked. A background job runs every 60 seconds, scanning for reservations older than 10 minutes:
async function reclaimExpiredReservations(saleId: string): Promise<number> {
const expiredOrders = await db.query(
`SELECT id, sku_id, quantity FROM orders
WHERE sale_id = $1 AND status = 'inventory_reserved'
AND created_at < now() - interval '10 minutes'`,
[saleId]
);
let reclaimed = 0;
for (const order of expiredOrders.rows) {
await inventory.releaseInventory(saleId, order.sku_id, order.quantity);
await db.query(
`UPDATE orders SET status = 'expired', failure_reason = 'reservation_timeout'
WHERE id = $1`,
[order.id]
);
reclaimed += order.quantity;
}
return reclaimed;
}Without this, inventory leaks on every sale. At 200K orders with a 5% abandonment rate, 10K units would be permanently locked.
Async Sync to PostgreSQL
Valkey is the source of truth during the sale for inventory counts. PostgreSQL provides durability and post-sale reconciliation.
Coupon System: Pool Management and One-Per-User Enforcement
With inventory atomics solved, the coupon system introduces a different constraint: not just atomic decrement, but also per-user uniqueness. A user must never claim more than one coupon per campaign, even across multiple tabs, devices, and sessions.
Problem: 500,000 unique coupon codes must be distributed to users, one per user, at high speed. Two sub-problems: (1) no two users should receive the same code, and (2) no single user should receive two codes.
Simple example: A stack of 500,000 gift cards on a table. A single clerk hands them out. Each person must show their ID. The clerk checks a list before giving a card -- if the name is already on the list, the person is turned away.
Mental model: The coupon pool is a Valkey list (LPOP for atomic claim). The user dedup is a Valkey SET NX (atomic set-if-not-exists). These two atomic operations run in a single Lua script, making the entire claim operation atomic.
Pre-Loading the Coupon Pool
Before the sale starts, coupon codes are generated and loaded into a Valkey list:
// coupon-loader.ts
import { v4 as uuidv4 } from 'uuid';
import { Redis } from 'ioredis';
const valkey = new Redis({ host: 'valkey-cluster', port: 6379 });
interface CouponCampaign {
id: string;
poolSize: number;
prefix: string;
validUntil: Date;
}
async function loadCouponPool(campaign: CouponCampaign): Promise<void> {
const codes: string[] = [];
// Generate unique codes
for (let i = 0; i < campaign.poolSize; i++) {
const suffix = uuidv4().replace(/-/g, '').substring(0, 8).toUpperCase();
const code = `${campaign.prefix}-${suffix}`;
codes.push(code);
}
// Load into Valkey list in batches
const BATCH_SIZE = 10000;
const pipeline = valkey.pipeline();
for (let i = 0; i < codes.length; i += BATCH_SIZE) {
const batch = codes.slice(i, i + BATCH_SIZE);
pipeline.rpush(`coupon:pool:${campaign.id}`, ...batch);
}
// Set TTL
const ttlSeconds = Math.ceil((campaign.validUntil.getTime() - Date.now()) / 1000) + 86400;
pipeline.expire(`coupon:pool:${campaign.id}`, ttlSeconds);
await pipeline.exec();
// Also insert into PostgreSQL for durability
// ... batch INSERT into coupon_codes table
console.log(`Loaded ${codes.length} codes for campaign ${campaign.id}`);
}Atomic Coupon Claim with LPOP + SET NX
The Valkey LPOP command is atomic. Only one client can pop a given element. Combined with SET NX for user dedup, the entire claim is a single atomic Lua script:
-- coupon_claim.lua
-- KEYS[1] = coupon:pool:{campaign_id} (list of available codes)
-- KEYS[2] = coupon:claimed:{campaign_id} (set of user_ids who claimed)
-- KEYS[3] = coupon:user:{user_id}:{campaign_id} (SET NX guard)
-- ARGV[1] = user_id
-- ARGV[2] = TTL in seconds
-- Returns: coupon_code string, or "ALREADY_CLAIMED", or "POOL_EXHAUSTED"
-- Step 1: Check if user already claimed (fast path)
local existing = redis.call('GET', KEYS[3])
if existing ~= false then
return 'ALREADY_CLAIMED'
end
-- Step 2: Check if user is in claimed set (belt and suspenders)
local isMember = redis.call('SISMEMBER', KEYS[2], ARGV[1])
if isMember == 1 then
return 'ALREADY_CLAIMED'
end
-- Step 3: Pop a code from the pool
local code = redis.call('LPOP', KEYS[1])
if code == false then
return 'POOL_EXHAUSTED'
end
-- Step 4: Mark user as claimed
redis.call('SET', KEYS[3], code, 'EX', tonumber(ARGV[2]))
redis.call('SADD', KEYS[2], ARGV[1])
return codeCoupon Claim Flow
Coupon Service with Pool Exhaustion Handling
// coupon-service.ts
async function claimCoupon(userId: string, campaignId: string): Promise<ClaimResult> {
const result = await valkey.evalsha(
couponClaimSha,
3,
`coupon:pool:${campaignId}`,
`coupon:claimed:${campaignId}`,
`coupon:user:${userId}:${campaignId}`,
userId,
'86400' // 24-hour TTL
);
if (result === 'ALREADY_CLAIMED') {
// Retrieve their existing code
const existingCode = await valkey.get(`coupon:user:${userId}:${campaignId}`);
return {
status: 'already_claimed',
code: existingCode || undefined,
message: 'You have already claimed a coupon from this campaign'
};
}
if (result === 'POOL_EXHAUSTED') {
// Update campaign status in DB
await db.query(
`UPDATE coupon_campaigns SET status = 'exhausted' WHERE id = $1 AND status = 'active'`,
[campaignId]
);
// Publish event for real-time UI update
await kafka.send({
topic: 'coupon-events',
messages: [{ key: campaignId, value: JSON.stringify({ event: 'pool_exhausted', campaignId }) }]
});
return { status: 'pool_exhausted', message: 'All coupons have been claimed' };
}
// Success: persist to PostgreSQL
const code = result as string;
try {
await db.query(
`INSERT INTO coupon_claims (user_id, campaign_id, coupon_code_id, code, status)
VALUES ($1, $2, (SELECT id FROM coupon_codes WHERE code = $3), $3, 'claimed')`,
[userId, campaignId, code]
);
} catch (err: any) {
if (err.code === '23505') {
// UNIQUE violation: user already claimed at DB level
// Return the coupon to the pool
await valkey.rpush(`coupon:pool:${campaignId}`, code);
await valkey.del(`coupon:user:${userId}:${campaignId}`);
return { status: 'already_claimed', message: 'Duplicate claim detected' };
}
throw err;
}
return {
status: 'success',
code,
campaignId,
message: 'Coupon claimed successfully'
};
}Coupon Return (On Payment Failure or Expiry)
-- coupon_return.lua
-- KEYS[1] = coupon:pool:{campaign_id}
-- KEYS[2] = coupon:claimed:{campaign_id}
-- KEYS[3] = coupon:user:{user_id}:{campaign_id}
-- ARGV[1] = user_id
-- ARGV[2] = coupon_code
-- Remove user from claimed set
redis.call('SREM', KEYS[2], ARGV[1])
-- Delete user's claim key
redis.call('DEL', KEYS[3])
-- Return code to pool (push to right end)
redis.call('RPUSH', KEYS[1], ARGV[2])
return 1Two-Layer Defense: Valkey + PostgreSQL
This is the most critical correctness requirement. A single user must never claim more than one coupon per campaign, even across multiple browser tabs, bots, or devices.
Layer 1: Valkey SET NX (Fast Path). The SET key value NX command sets the key only if it does not already exist. This is atomic. The Lua script checks coupon:user:{user_id}:{campaign_id} before doing anything else.
Layer 2: PostgreSQL UNIQUE Constraint (Durable Backstop). The coupon_claims table has UNIQUE(user_id, campaign_id). Even if Valkey fails, restarts, or loses data, the database rejects a duplicate insert with error code 23505.
Race Condition Analysis
What If Valkey Restarts Between Claim and DB Write?
This is the dangerous window. If Valkey restarts after the Lua script succeeds but before the PostgreSQL insert completes:
- Valkey loses the
coupon:user:U1:C1key - A retry sends another claim request
- Valkey SET NX succeeds (key was lost)
- A second code is popped from the pool
- The INSERT into coupon_claims is attempted
- PostgreSQL UNIQUE constraint rejects the insert with error 23505
- The second coupon code is returned to the pool
This is why the two-layer defense is essential. Valkey handles the hot path; PostgreSQL is the safety net.
// Race condition recovery handler
async function handleDuplicateClaimAtDB(
userId: string,
campaignId: string,
codeToReturn: string
): Promise<void> {
// Return the code to the pool since this user already has one
await valkey.rpush(`coupon:pool:${campaignId}`, codeToReturn);
// Re-set the Valkey guard key (it was lost during restart)
const existingClaim = await db.query(
`SELECT code FROM coupon_claims WHERE user_id = $1 AND campaign_id = $2`,
[userId, campaignId]
);
if (existingClaim.rows.length > 0) {
await valkey.set(
`coupon:user:${userId}:${campaignId}`,
existingClaim.rows[0].code,
'EX',
86400
);
}
}Distributed Uniqueness Across Multiple Services
In a microservices architecture, the coupon claim might be called from the Coupon Service directly (user clicks "Claim"), the Checkout Service (auto-claim during checkout), or a batch job (promotional distribution). All paths must go through the same Valkey + PostgreSQL enforcement:
Rule: No service bypasses the Coupon Service. Even internal batch jobs call the same claimCoupon() function. This single-writer pattern prevents enforcement gaps.
Additional Safeguards
| Safeguard | Purpose |
|---|---|
| User ID from JWT (server-side) | Prevent client-side user ID spoofing |
| Rate limiting (5 claims/min per user) | Slow down automated attempts |
| Device fingerprint logging | Detect multi-account abuse post-hoc |
| IP-based throttling | 10 claims/min per IP address |
| Claim audit log (Kafka) | Full audit trail for fraud investigation |
Coupon Enforcement Approach Comparison
| Approach | One-Per-User Guarantee | Throughput | Failure Mode |
|---|---|---|---|
| PostgreSQL UNIQUE constraint only | Strong (after commit) | ~3,000 ops/sec | Slow under contention |
| Application-level HashMap | None (race conditions) | High | Double-claims |
| Valkey SET NX (user:coupon:campaign) | Strong (atomic) | ~200,000 ops/sec | Lost on Valkey restart |
| Valkey SET NX + PostgreSQL UNIQUE | Strong (two layers) | ~100,000 ops/sec | Durable + fast |
| Bloom Filter pre-check + DB | Probabilistic pre-check | Very high | False positives (safe direction) |
Coupon Types and Stacking Rules
With claiming solved, the next challenge is applying coupons correctly at checkout. A real coupon system supports multiple discount types, each with its own calculation logic and validation rules.
Coupon Type Definitions
| Type | Code | How It Works | Example |
|---|---|---|---|
| Percent Off | percent_off | Deducts a percentage of the subtotal, up to a max cap | 15% off, max $100 discount |
| Fixed Amount | fixed_amount | Deducts a fixed dollar amount from the subtotal | $25 off |
| Buy One Get One | bogo | Adds a free item (cheapest item free, or specific SKU) | Buy 1 get 1 free |
| Free Shipping | free_shipping | Waives the shipping fee | Free standard shipping |
Discount Calculation Logic
// coupon-calculator.ts
interface CartItem {
skuId: string;
name: string;
price: number;
quantity: number;
category: string;
}
interface AppliedCoupon {
code: string;
type: 'percent_off' | 'fixed_amount' | 'bogo' | 'free_shipping';
discountValue: number;
maxDiscountCap: number | null;
minCartValue: number;
applicableSkus: string[] | null;
}
interface DiscountResult {
couponCode: string;
type: string;
discountAmount: number;
description: string;
}
function calculateDiscount(
items: CartItem[],
coupon: AppliedCoupon,
shippingCost: number
): DiscountResult {
const subtotal = items.reduce((sum, item) => sum + item.price * item.quantity, 0);
// Validate minimum cart value
if (subtotal < coupon.minCartValue) {
return {
couponCode: coupon.code,
type: coupon.type,
discountAmount: 0,
description: `Cart minimum $${coupon.minCartValue} not met`
};
}
// Filter applicable items
const applicableItems = coupon.applicableSkus
? items.filter(item => coupon.applicableSkus!.includes(item.skuId))
: items;
const applicableSubtotal = applicableItems.reduce(
(sum, item) => sum + item.price * item.quantity, 0
);
switch (coupon.type) {
case 'percent_off': {
let discount = applicableSubtotal * (coupon.discountValue / 100);
if (coupon.maxDiscountCap !== null) {
discount = Math.min(discount, coupon.maxDiscountCap);
}
return {
couponCode: coupon.code,
type: 'percent_off',
discountAmount: Math.round(discount * 100) / 100,
description: `${coupon.discountValue}% off${coupon.maxDiscountCap ? ` (max $${coupon.maxDiscountCap})` : ''}`
};
}
case 'fixed_amount': {
const discount = Math.min(coupon.discountValue, applicableSubtotal);
return {
couponCode: coupon.code,
type: 'fixed_amount',
discountAmount: Math.round(discount * 100) / 100,
description: `$${coupon.discountValue} off`
};
}
case 'bogo': {
// Cheapest item free
if (applicableItems.length < 2) {
return {
couponCode: coupon.code,
type: 'bogo',
discountAmount: 0,
description: 'Add at least 2 eligible items for BOGO'
};
}
const cheapest = [...applicableItems].sort((a, b) => a.price - b.price)[0];
return {
couponCode: coupon.code,
type: 'bogo',
discountAmount: cheapest.price,
description: `Free: ${cheapest.name} ($${cheapest.price})`
};
}
case 'free_shipping': {
return {
couponCode: coupon.code,
type: 'free_shipping',
discountAmount: shippingCost,
description: 'Free shipping'
};
}
}
}Stacking Rules
Most flash sales allow limited stacking. The rules engine determines which combinations are valid.
| Rule | Description |
|---|---|
| Max stackable coupons | 2 (one discount coupon + one free shipping) |
| Percent + Fixed | NOT stackable (only one discount type) |
| Percent + Free Shipping | Stackable |
| Fixed + Free Shipping | Stackable |
| BOGO + anything | NOT stackable |
| Stacking priority | Free shipping applied last (after discount coupons) |
| Maximum total discount | Cannot exceed 60% of subtotal |
| Minimum final price | Order total must be >= $1.00 after all discounts |
Stacking Validation Engine
// stacking-rules.ts
interface StackingValidation {
valid: boolean;
reason?: string;
appliedCoupons: AppliedCoupon[];
totalDiscount: number;
finalTotal: number;
}
const DISCOUNT_TYPES = new Set(['percent_off', 'fixed_amount', 'bogo']);
const MAX_DISCOUNT_PERCENT = 0.60; // 60% max
const MIN_FINAL_PRICE = 1.00;
const MAX_STACKED_COUPONS = 2;
function validateStacking(
coupons: AppliedCoupon[],
subtotal: number,
shippingCost: number
): StackingValidation {
// Rule 1: Max stackable count
if (coupons.length > MAX_STACKED_COUPONS) {
return {
valid: false,
reason: `Maximum ${MAX_STACKED_COUPONS} coupons can be combined`,
appliedCoupons: [],
totalDiscount: 0,
finalTotal: subtotal + shippingCost
};
}
// Rule 2: Only one discount-type coupon allowed
const discountCoupons = coupons.filter(c => DISCOUNT_TYPES.has(c.type));
if (discountCoupons.length > 1) {
return {
valid: false,
reason: 'Only one discount coupon can be applied per order',
appliedCoupons: [],
totalDiscount: 0,
finalTotal: subtotal + shippingCost
};
}
// Rule 3: BOGO cannot stack with anything
const hasBogo = coupons.some(c => c.type === 'bogo');
if (hasBogo && coupons.length > 1) {
return {
valid: false,
reason: 'Buy One Get One coupons cannot be combined with other offers',
appliedCoupons: [],
totalDiscount: 0,
finalTotal: subtotal + shippingCost
};
}
// Rule 4: Check each coupon is individually stackable (except if only one)
if (coupons.length > 1) {
for (const coupon of coupons) {
if (!coupon.stackable) {
return {
valid: false,
reason: `Coupon ${coupon.code} cannot be combined with other offers`,
appliedCoupons: [],
totalDiscount: 0,
finalTotal: subtotal + shippingCost
};
}
}
}
// Sort by stacking priority (lower number = applied first)
const sorted = [...coupons].sort((a, b) => a.stackingPriority - b.stackingPriority);
// Calculate total discount
let totalDiscount = 0;
const results: DiscountResult[] = [];
let remainingSubtotal = subtotal;
let remainingShipping = shippingCost;
for (const coupon of sorted) {
if (coupon.type === 'free_shipping') {
totalDiscount += remainingShipping;
remainingShipping = 0;
} else {
const result = calculateSingleDiscount(coupon, remainingSubtotal);
totalDiscount += result.amount;
remainingSubtotal -= result.amount;
}
}
// Rule 5: Max discount percentage
const discountPercent = totalDiscount / (subtotal + shippingCost);
if (discountPercent > MAX_DISCOUNT_PERCENT) {
const cappedDiscount = Math.floor((subtotal + shippingCost) * MAX_DISCOUNT_PERCENT * 100) / 100;
return {
valid: true,
reason: `Discount capped at ${MAX_DISCOUNT_PERCENT * 100}% of order total`,
appliedCoupons: sorted,
totalDiscount: cappedDiscount,
finalTotal: (subtotal + shippingCost) - cappedDiscount
};
}
// Rule 6: Minimum final price
const finalTotal = (subtotal + shippingCost) - totalDiscount;
if (finalTotal < MIN_FINAL_PRICE) {
const adjustedDiscount = (subtotal + shippingCost) - MIN_FINAL_PRICE;
return {
valid: true,
reason: `Minimum order total is $${MIN_FINAL_PRICE}`,
appliedCoupons: sorted,
totalDiscount: adjustedDiscount,
finalTotal: MIN_FINAL_PRICE
};
}
return {
valid: true,
appliedCoupons: sorted,
totalDiscount,
finalTotal
};
}Calculation Examples
Example 1: 15% off coupon on a $649.99 laptop (max cap $100)
| Step | Value |
|---|---|
| Subtotal | $649.99 |
| Discount (15% of $649.99) | $97.50 |
| Cap check ($97.50 < $100) | Under cap |
| Shipping | $9.99 |
| Total | $649.99 - $97.50 + $9.99 = $562.48 |
Example 2: 15% off coupon + Free shipping stacked
| Step | Value |
|---|---|
| Subtotal | $649.99 |
| Percent discount (applied first) | $97.50 |
| Free shipping discount (applied second) | $9.99 |
| Total discount | $107.49 |
| Discount % of order (107.49/659.98) | 16.3% (under 60% cap) |
| Final total | $649.99 - $97.50 + $0.00 = $552.49 |
Example 3: $25 fixed coupon on $30 subtotal
| Step | Value |
|---|---|
| Subtotal | $30.00 |
| Fixed discount | $25.00 |
| Shipping | $9.99 |
| Final total | $30.00 - $25.00 + $9.99 = $14.99 |
| Min price check ($14.99 >= $1.00) | Pass |
| Final total | $14.99 |
Example 4: Attempted BOGO + percent off (rejected)
| Step | Result |
|---|---|
| Coupons | BOGO + 15% off |
| Stacking check | BOGO present + count > 1 |
| Result | Rejected: BOGO cannot stack |
Coupon Type Decision Flowchart
Checkout Saga with Coupon Rollback
A single database transaction cannot span Valkey, a payment gateway, and PostgreSQL. These are separate systems with separate failure modes. This means partial failure is inevitable: inventory can be reserved in Valkey while the payment gateway is down. The system must explicitly undo completed steps when a later step fails. This is the saga pattern.
Problem: A user checks out with a reserved item and an applied coupon. The payment is declined. The inventory and coupon must be returned immediately -- otherwise they are permanently consumed by a failed order.
Simple example: A restaurant reservation with a deposit. If the customer cancels, the table is released and the deposit is refunded. If only the table is released but not the deposit, money is lost. If only the deposit is refunded but not the table, capacity is wasted.
Mental model: Each saga step is a domino. Push them forward for the happy path. If one falls sideways (fails), pick up the fallen dominoes in reverse order.
Saga Steps
| Step | Action | Compensating Action |
|---|---|---|
| 1 | Reserve inventory (Valkey DECRBY) | Release inventory (Valkey INCRBY) |
| 2 | Apply coupon (mark as 'applied') | Return coupon (mark as 'claimed', or return to pool) |
| 3 | Process payment (charge card) | Void/refund payment |
| 4 | Confirm order (PostgreSQL insert) | Cancel order (mark as 'cancelled') |
| 5 | Emit events (Kafka) | Emit compensation events |
Temporal Workflow
// checkout-workflow.ts (Temporal)
import { proxyActivities, sleep, ApplicationFailure } from '@temporalio/workflow';
interface CheckoutInput {
orderId: string;
userId: string;
saleId: string;
skuId: string;
quantity: number;
couponCode: string | null;
paymentToken: string;
paymentMethod: string;
}
interface CheckoutResult {
orderId: string;
status: 'confirmed' | 'failed';
failureReason?: string;
}
const inventory = proxyActivities<typeof import('./activities/inventory')>({
startToCloseTimeout: '10s',
retry: { maximumAttempts: 3 }
});
const coupons = proxyActivities<typeof import('./activities/coupons')>({
startToCloseTimeout: '10s',
retry: { maximumAttempts: 3 }
});
const payments = proxyActivities<typeof import('./activities/payments')>({
startToCloseTimeout: '30s', // payment gateways can be slow
retry: { maximumAttempts: 2 }
});
const orders = proxyActivities<typeof import('./activities/orders')>({
startToCloseTimeout: '10s',
retry: { maximumAttempts: 3 }
});
export async function checkoutWorkflow(input: CheckoutInput): Promise<CheckoutResult> {
let inventoryReserved = false;
let couponApplied = false;
let paymentId: string | null = null;
try {
// Step 1: Reserve Inventory
const reserveResult = await inventory.reserveInventory(
input.saleId, input.skuId, input.quantity
);
if (reserveResult.status !== 'success') {
return {
orderId: input.orderId,
status: 'failed',
failureReason: `Inventory unavailable: ${reserveResult.status}`
};
}
inventoryReserved = true;
// Step 2: Apply Coupon (if present)
if (input.couponCode) {
const couponResult = await coupons.applyCoupon(
input.userId, input.couponCode, input.orderId
);
if (couponResult.status !== 'success') {
throw ApplicationFailure.nonRetryable(
`Coupon application failed: ${couponResult.reason}`
);
}
couponApplied = true;
}
// Step 3: Process Payment
const paymentResult = await payments.processPayment({
orderId: input.orderId,
userId: input.userId,
amount: reserveResult.totalAmount,
couponDiscount: couponApplied ? reserveResult.couponDiscount : 0,
paymentToken: input.paymentToken,
paymentMethod: input.paymentMethod
});
if (paymentResult.status !== 'captured') {
throw ApplicationFailure.nonRetryable(
`Payment failed: ${paymentResult.reason}`
);
}
paymentId = paymentResult.paymentId;
// Step 4: Confirm Order
await orders.confirmOrder(input.orderId, paymentId);
// Step 5: Emit Success Events
await orders.emitOrderConfirmed(input.orderId);
return { orderId: input.orderId, status: 'confirmed' };
} catch (error: any) {
// COMPENSATING TRANSACTIONS (reverse order)
// Compensate Step 3: Void payment if charged
if (paymentId) {
try {
await payments.voidPayment(paymentId);
} catch (voidErr) {
// Log for manual intervention; do not throw
await orders.flagForManualReview(input.orderId, 'payment_void_failed');
}
}
// Compensate Step 2: Return coupon
if (couponApplied && input.couponCode) {
try {
await coupons.returnCoupon(input.userId, input.couponCode);
} catch (couponErr) {
await orders.flagForManualReview(input.orderId, 'coupon_return_failed');
}
}
// Compensate Step 1: Release inventory
if (inventoryReserved) {
try {
await inventory.releaseInventory(input.saleId, input.skuId, input.quantity);
} catch (invErr) {
await orders.flagForManualReview(input.orderId, 'inventory_release_failed');
}
}
// Mark order as failed
await orders.failOrder(input.orderId, error.message);
// Emit failure event
await orders.emitOrderFailed(input.orderId, error.message);
return {
orderId: input.orderId,
status: 'failed',
failureReason: error.message
};
}
}Coupon Rollback Activity
// activities/coupons.ts
export async function returnCoupon(userId: string, couponCode: string): Promise<void> {
// Step 1: Get campaign info from the coupon code
const couponInfo = await db.query(
`SELECT cc.campaign_id, cc.id as code_id
FROM coupon_codes cc
WHERE cc.code = $1 AND cc.claimed_by = $2`,
[couponCode, userId]
);
if (couponInfo.rows.length === 0) {
throw new Error(`Coupon ${couponCode} not found for user ${userId}`);
}
const { campaign_id: campaignId, code_id: codeId } = couponInfo.rows[0];
// Step 2: Return code to Valkey pool
await valkey.evalsha(couponReturnSha, 3,
`coupon:pool:${campaignId}`,
`coupon:claimed:${campaignId}`,
`coupon:user:${userId}:${campaignId}`,
userId,
couponCode
);
// Step 3: Update PostgreSQL
await db.query('BEGIN');
try {
// Mark the claim as rolled back
await db.query(
`UPDATE coupon_claims
SET status = 'rolled_back', rolled_back_at = now()
WHERE user_id = $1 AND campaign_id = $2`,
[userId, campaignId]
);
// Mark the code as available again
await db.query(
`UPDATE coupon_codes
SET status = 'returned', claimed_by = NULL, claimed_at = NULL
WHERE id = $1`,
[codeId]
);
// Decrement campaign claimed count
await db.query(
`UPDATE coupon_campaigns
SET claimed_count = claimed_count - 1,
status = CASE WHEN status = 'exhausted' THEN 'active' ELSE status END
WHERE id = $1`,
[campaignId]
);
await db.query('COMMIT');
} catch (err) {
await db.query('ROLLBACK');
throw err;
}
console.log(`Coupon ${couponCode} returned to pool for campaign ${campaignId}`);
}Why Temporal Over Manual Saga
| Aspect | Manual Saga (Event-Driven) | Temporal Workflow |
|---|---|---|
| Compensation logic | Scattered across event handlers | Co-located in try/catch |
| Retry policy | Custom per-service | Declarative per-activity |
| Visibility | Grep logs across services | Temporal UI shows workflow state |
| Stuck workflows | Manual detection | Automatic timeout + alerting |
| Replay/debug | Near impossible | Temporal replay from event history |
| Idempotency | Must implement manually | Built-in deduplication |
Bottlenecks and Mitigations
Bottleneck 1: Valkey Hot Key for Popular SKUs
Problem: A doorbuster item (e.g., $99 laptop, 50 units) will concentrate all reads and writes on a single Valkey key. In cluster mode, that key lives on one shard, creating asymmetric load.
Mitigation:
- Shard inventory across 16 virtual slots per hot SKU (see Atomic Inventory Control section)
- Use Valkey client-side caching for read-heavy inventory checks
- Pre-sort users by SKU interest during queue admission to spread load
Impact if unmitigated: Single shard saturates at 200K ops/sec. With 10M users all checking the same doorbuster, the system needs 50K ops/sec on one key. Manageable with a single shard, but combined with other operations, sharding provides safety margin.
Bottleneck 2: Payment Gateway Latency
Problem: Payment gateways (Stripe, Adyen) have p99 latencies of 2-5 seconds. During a flash sale with 833 checkouts/sec, long-running payment calls consume thread pool capacity.
Mitigation:
- Async payment processing via Temporal (non-blocking)
- Payment timeout of 30 seconds with automatic retry (once)
- Pre-authorize cards before sale starts (for registered users)
- Circuit breaker on payment gateway calls (trip at 20% error rate)
Impact if unmitigated: 833 concurrent payment calls x 3 seconds average = 2,500 in-flight requests. Without async processing, this exhausts connection pools and cascades failures to inventory and coupon services.
Bottleneck 3: Coupon Pool Exhaustion Race
Problem: When the coupon pool is nearly empty (last 100 codes), thousands of users race to claim simultaneously. The LPOP is atomic, but the losers still consume a round-trip to Valkey.
Mitigation:
- Track pool size with
LLENand expose "low inventory" indicator in UI - When pool drops below 1% remaining, add a client-side "lottery" step: only 1 in 10 requests actually calls the claim endpoint
- Pre-announce "coupons limited" messaging to set expectations
- Return a waitlist option when pool is exhausted
Impact if unmitigated: 100K users hitting the claim endpoint for the last 50 codes generates 100K unnecessary Valkey round-trips, adding latency to other operations on the same shard.
Component-Level Failure Modeling
Each bottleneck above addresses performance limits. Failure modeling addresses what happens when components stop working entirely. Every component in this system can fail, and each failure has a different blast radius.
Scenario 1: Payment Timeout with Inventory Reserved
Situation: A payment call takes 45 seconds (exceeding the 30-second timeout). Inventory is reserved but not confirmed. The Temporal workflow times out the payment activity.
Handling:
- Temporal retries the payment check using the idempotency key
- If payment was actually captured, the workflow proceeds to confirm
- If payment was not captured, full compensation runs
- Inventory hold expires after 10 minutes regardless (safety net)
Scenario 2: Valkey Shard Failure During Sale
Situation: The Valkey shard holding inventory keys for 30% of SKUs goes down. The replica promotes, but there is a 5-second gap.
Handling:
| Time | Event | System Response |
|---|---|---|
| T+0s | Primary shard fails | Valkey Sentinel detects failure |
| T+1s | Sentinel starts election | Checkout requests for affected SKUs get connection errors |
| T+2s | Replica promoted | Write availability restored |
| T+3s | Clients reconnect | Inventory ops resume |
| T+0 to T+3s | Checkout requests fail | Checkout service returns 503 with "Retry-After: 5" header |
| T+3s+ | Queue admission paused | Admission controller detects error spike, pauses admission |
| T+5s | Health check passes | Admission resumes at 50% rate, ramps back to 100% |
Data consistency check:
- Valkey replication is asynchronous. The promoted replica might be 1-2 ops behind.
- Post-failover reconciliation job compares Valkey inventory counts with PostgreSQL ledger
- Any mismatch triggers an alert and manual review before resuming sales
Scenario 3: Coupon Service Down Mid-Checkout
Situation: The Coupon Service crashes after inventory is reserved but before the coupon is applied. Temporal retries the coupon application activity.
Handling:
Key point: The user does not lose their coupon. If the coupon was already applied in Valkey before the crash, the retry will see "already applied" and proceed. If the coupon was never applied, the retry will apply it fresh. The combination of idempotent operations and Temporal's retry logic handles this gracefully.
Scenario 4: PostgreSQL Primary Failure
Situation: The PostgreSQL primary fails during an active sale. Failover to a read replica takes approximately 30 seconds.
Impact: Checkout writes fail for 30 seconds. Order creation and coupon claim persistence cannot proceed. However, Valkey continues serving inventory atomics and coupon claims at the Valkey layer -- the fast path remains operational.
Fallback: Orders queue in Kafka. The Temporal workflows pause at the "Confirm Order" step, holding inventory and coupon reservations. Once the replica is promoted and writes resume, Temporal retries the order confirmation activity.
Recovery: Promote the read replica to primary. Replay any Kafka events that were produced during the outage window but not yet consumed by the inventory-sync consumer.
Tradeoff: Up to 30 seconds of order creation failures. Users see "Order processing" status rather than immediate confirmation. No data is lost because Temporal persists workflow state and Kafka retains events.
Scenario 5: Kafka Broker Failure
Situation: One of three Kafka brokers goes down during the sale.
Impact: Partition leadership transfers in ~5 seconds. During the transfer, events targeting partitions led by the failed broker experience a latency spike. No data is lost because the replication factor is 3 and the ISR minimum is 2.
Recovery: Automatic leader election completes in ~5 seconds. Consumer groups rebalance, briefly pausing consumption. The inventory-sync consumer resumes from its committed offset.
Tradeoff: Brief consumer rebalance causes a ~5 second latency spike for inventory sync and analytics. Order processing via Temporal is unaffected because Temporal has its own persistence layer.
Scenario 6: CDN Failure
Situation: CloudFront experiences a regional outage. The origin receives the full 10M user load.
Impact: The origin servers are overloaded within seconds. The API Gateway, Valkey, and PostgreSQL are overwhelmed by traffic that should have been served from CDN.
Fallback: DNS failover to a secondary CDN provider (Fastly) triggers within 60 seconds based on health check failure.
Recovery: CDN self-heals or manual cache purge restores service. The secondary CDN serves from its own edge caches, which were warm because of synthetic traffic pre-warming.
Tradeoff: A 60-second DNS failover window where the sale is effectively down. Mitigation: use dual-CDN configuration with active-active health checks and shorter DNS TTLs (30s) during sale windows.
Scenario 7: API Gateway Failure
Situation: An API Gateway pod crashes during the sale.
Impact: Kubernetes restarts the pod in ~10 seconds. In-flight requests on that pod receive connection resets. Other gateway pods continue serving traffic via the load balancer.
Recovery: Automatic. The load balancer detects the failed health check and routes traffic to healthy pods. The replacement pod starts serving within 10 seconds.
Tradeoff: Brief connection reset for in-flight requests routed to the failed pod. Clients should retry with exponential backoff.
Scenario 8: Temporal Failure
Situation: The Temporal server goes down during an active sale with in-flight checkout workflows.
Impact: Saga workflows pause. Inventory and coupons remain reserved but unconfirmed. No new checkouts can start.
Recovery: Temporal has persistent storage (PostgreSQL or Cassandra). On restart, all in-flight workflows resume from their last checkpoint. No workflow step is repeated because Temporal's event history tracks which activities completed.
Tradeoff: Checkout latency spike during the outage. Users see "Order processing" for an extended period. No data loss or double-processing occurs.
Scenario 9: Payment Gateway Timeout
Situation: The payment gateway starts timing out consistently.
Impact: Circuit breaker opens after 5 consecutive timeouts. The checkout service stops sending new payment requests.
Fallback: Return "payment pending" status to users. Enqueue payment retries via a background job that probes the gateway every 30 seconds.
Recovery: Close the circuit breaker after 3 successful probe responses. Resume normal payment processing.
Tradeoff: Delayed order confirmation. Users experience anxiety during the "payment pending" state. Inventory remains reserved (10-minute timeout protects against permanent lock).
Deployment and Operations
Pre-Sale Preparation (T-24 Hours)
| Task | Details |
|---|---|
| Load test | Simulate 10M users with k6/Gatling hitting queue join, coupon claim, and checkout |
| CDN cache warming | Deploy sale page, waiting room, and all static assets to CDN edge nodes |
| Valkey pre-loading | Load inventory counters and coupon pools into Valkey |
| Database vacuuming | Run VACUUM ANALYZE on all tables involved in the sale |
| Connection pool warming | Pre-establish DB and Valkey connection pools |
| Runbook review | Ensure on-call team has step-by-step for all failure scenarios |
Deployment Architecture
Auto-Scaling Configuration
# HPA for checkout service
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: checkout-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: checkout-service
minReplicas: 10 # Pre-scaled for sale
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100" # Scale up when > 100 req/sec per pod
- type: Pods
pods:
metric:
name: temporal_workflow_queue_depth
target:
type: AverageValue
averageValue: "50" # Scale up when workflow queue backs up
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 10
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling downRollback Plan
| Trigger | Action |
|---|---|
| Checkout error rate > 5% for 2 min | Pause queue admission, investigate |
| Checkout error rate > 20% for 1 min | Pause sale, show maintenance page |
| Valkey cluster unhealthy | Failover to replica, pause admission during promotion |
| Payment gateway down | Queue orders for retry, show "order processing" status |
| Database CPU > 90% | Enable read replicas for all read queries, reject new checkouts |
Feature Flags
Feature flags allow runtime configuration changes without redeployment. During an active sale, toggling a flag takes effect within seconds via the configuration service.
| Flag | Description | Default |
|---|---|---|
sale_queue_enabled | Disable the virtual queue for small sales with < 10K expected users | true |
coupon_stacking_enabled | Toggle stacking rules; when disabled, only one coupon per order | true |
sharded_inventory | Enable/disable 16-slot inventory sharding for doorbuster SKUs | true |
payment_circuit_breaker_threshold | Number of consecutive timeouts before the circuit breaker opens | 5 |
Database Migrations
Pre-sale (T-48h): All schema migrations run 48 hours before the sale. Migrations are tested against a production-snapshot database to verify execution time and lock duration.
Zero-downtime approach: Nullable columns are added first, code is deployed to write to both old and new columns, backfill runs as a background job, then the NOT NULL constraint is added after all rows have values.
coupon_claims table: Uses online DDL via pg_repack for structural changes. The UNIQUE(user_id, campaign_id) constraint is never modified during a sale.
Post-sale: Completed sale partitions in the orders table are detached from the partitioned table (sub-second exclusive lock) and moved to cold storage. This keeps the hot partition set small for the next sale.
Rollback
Code rollback: kubectl rollout undo deployment/checkout-service completes in under 30 seconds. Kubernetes performs a rolling update, draining connections from old pods before terminating them.
During active sale: Rollback requires a sale pause. The admission controller stops new checkouts, in-flight Temporal sagas complete (or compensate), then the rollback proceeds. This prevents half-old/half-new code from processing the same orders.
Valkey state is not rolled back. Inventory counts, coupon pool state, and queue positions are forward-only. If a code change corrupted Valkey state, the fix is forward -- a reconciliation job compares Valkey with PostgreSQL and adjusts counts. Rolling back Valkey to a previous point-in-time risks overselling or double-claiming.
Observability
The deployment strategy gets code into production. Observability determines whether the running system is behaving correctly under sale load.
Sale-Specific Dashboards
| Dashboard | Key Metrics |
|---|---|
| Sale Overview | Orders/min, revenue/min, unique buyers, conversion rate |
| Inventory Tracker | Units remaining per SKU, sold-out SKUs, reserve-to-confirm ratio, depletion rate per SKU (%/min) |
| Coupon Dashboard | Claims/min, pool remaining per campaign, claim success %, redemption rate, rollback count |
| Queue Monitor | Queue depth, admission rate, queue drain rate, avg wait time, dropout rate |
| Payment Health | Success rate, avg latency, timeout count, gateway error breakdown |
| Infrastructure | Valkey ops/sec, DB connections, Kafka consumer lag, pod count, GC pause duration (p99) |
Critical Alerts
# Prometheus alerting rules
groups:
- name: flash-sale-alerts
rules:
- alert: InventoryOversold
expr: |
sale_inventory_sold_total > sale_inventory_total_quantity
for: 0s # Immediate alert
labels:
severity: critical
annotations:
summary: "CRITICAL: Inventory oversold for SKU {{ $labels.sku_id }}"
- alert: CouponDoubleClaimDetected
expr: |
rate(coupon_claim_duplicate_total[1m]) > 0
for: 0s
labels:
severity: critical
annotations:
summary: "Duplicate coupon claim detected for campaign {{ $labels.campaign_id }}"
- alert: CheckoutErrorRateHigh
expr: |
rate(checkout_errors_total[2m]) / rate(checkout_requests_total[2m]) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "Checkout error rate above 5%"
- alert: ValkeyLatencyHigh
expr: |
histogram_quantile(0.99, rate(valkey_command_duration_seconds_bucket[1m])) > 0.01
for: 1m
labels:
severity: warning
annotations:
summary: "Valkey p99 latency above 10ms"
- alert: CouponPoolNearlyExhausted
expr: |
coupon_pool_remaining / coupon_pool_total < 0.05
for: 0s
labels:
severity: info
annotations:
summary: "Coupon pool {{ $labels.campaign_id }} below 5% remaining"
- alert: QueueWaitTimeExcessive
expr: |
queue_estimated_wait_seconds_p99 > 300
for: 2m
labels:
severity: warning
annotations:
summary: "Queue wait time p99 exceeds 5 minutes"
- alert: PaymentSagaStuck
expr: |
temporal_workflow_running_duration_seconds > 120
for: 0s
labels:
severity: critical
annotations:
summary: "Checkout workflow running for over 2 minutes"
- alert: KafkaConsumerLagHigh
expr: |
kafka_consumer_group_lag > 10000
for: 5m
labels:
severity: warning
annotations:
summary: "Kafka consumer lag above 10K for group {{ $labels.consumer_group }}"Structured Logging
Every service emits structured JSON logs with correlation IDs:
{
"timestamp": "2026-07-01T12:05:30.123Z",
"level": "INFO",
"service": "checkout-service",
"trace_id": "abc123def456",
"user_id": "user_789",
"order_id": "ord_def456",
"sale_id": "sale_abc123",
"sku_id": "SKU-LAPTOP-001",
"action": "inventory_reserved",
"duration_ms": 3,
"valkey_ops": 1,
"inventory_remaining": 42
}Post-Sale Reconciliation
After the sale ends, a reconciliation job runs:
// reconciliation.ts
async function reconcileInventory(saleId: string): Promise<ReconciliationReport> {
const skus = await db.query(
`SELECT sku_id, total_quantity, sold_quantity, reserved_quantity
FROM sale_inventory WHERE sale_id = $1`,
[saleId]
);
const report: ReconciliationReport = { mismatches: [], total: 0 };
for (const sku of skus.rows) {
const valkeyAvail = parseInt(
await valkey.get(`inv:${saleId}:${sku.sku_id}`) || '0'
);
const valkeyReserved = parseInt(
await valkey.get(`inv:reserved:${saleId}:${sku.sku_id}`) || '0'
);
const expectedAvail = sku.total_quantity - sku.sold_quantity - sku.reserved_quantity;
if (valkeyAvail !== expectedAvail) {
report.mismatches.push({
skuId: sku.sku_id,
valkeyAvailable: valkeyAvail,
dbExpectedAvailable: expectedAvail,
valkeyReserved,
dbReserved: sku.reserved_quantity,
delta: valkeyAvail - expectedAvail
});
}
report.total++;
}
if (report.mismatches.length > 0) {
await alertOncall('INVENTORY_MISMATCH', report);
}
return report;
}Distributed Tracing
OpenTelemetry traces span the entire checkout path: queue-join, admission, checkout, inventory decrement, coupon claim, payment, and confirmation. Each trace captures the full saga lifecycle.
Key spans and latency budgets:
| Span | p99 Target | Notes |
|---|---|---|
lua_inventory_decrement | < 2ms | Valkey Lua eval; any spike indicates shard contention |
coupon_claim | < 5ms | Valkey Lua eval + SET NX; higher than inventory due to two-step check |
payment_call | < 3s | Payment gateway round-trip; high variance expected |
saga_total | < 5s | End-to-end Temporal workflow; dominated by payment latency |
Trace sampling: 100% during the sale window. Flash sales are short-duration, high-value events -- every request matters for debugging and post-sale analysis. Post-sale, sampling drops to 1% for background reconciliation and analytics traffic.
SLI/SLO
| SLI | SLO | Measurement |
|---|---|---|
| Checkout p99 latency | < 5s | Temporal workflow duration histogram |
| Inventory accuracy | 0 oversells | Valkey vs PostgreSQL reconciliation delta |
| Coupon uniqueness | 0 double-claims | PostgreSQL UNIQUE constraint violation count |
| Queue fairness | FIFO within 1% deviation | Queue position audit: compare actual admission order with join timestamp order |
Runbooks
inventory_count_mismatch: Pause admission. Run the reconciliation job manually. Compare the Valkey DECR log (from structured logging) with the PostgreSQL inventory ledger. Identify the divergence point. Manually adjust Valkey counts to match PostgreSQL if PostgreSQL is correct (it has the durable record). Resume admission at 50% rate, monitoring for further divergence.
coupon_pool_exhausted: Verify the pool size matches the expected campaign configuration. Check for leaked codes -- codes that were claimed in Valkey (SET NX succeeded) but never written to PostgreSQL (the INSERT failed or was dropped). Leaked codes reduce the effective pool size. Alert the product team to decide whether to generate additional codes.
payment_gateway_circuit_open: Check the payment gateway's status page. Verify the circuit breaker configuration (threshold, probe interval). If the gateway reports healthy, the issue may be network-related -- check DNS resolution and TLS handshake latency. If sustained for more than 5 minutes, escalate to the payment team and consider enabling a backup payment processor.
Security
Bot Prevention
Flash sales attract bots. Scalpers use automated tools to buy inventory before humans can react. A multi-layered defense is required.
| Layer | Technique | Implementation |
|---|---|---|
| CDN Edge | Rate limiting | 10 requests/sec per IP; burst of 20 |
| CDN Edge | Geo-blocking | Block traffic from non-serviceable regions |
| WAF | Known bot signatures | Block Selenium, Puppeteer, PhantomJS UA strings |
| Queue Join | CAPTCHA challenge | hCaptcha or Cloudflare Turnstile before queue entry |
| Queue Join | Proof of Work | Client-side computation challenge (100ms solve time) |
| Checkout | Device fingerprint | FingerprintJS to identify same device across sessions |
| Checkout | Behavioral analysis | Mouse movement, scroll patterns, typing cadence |
| Post-Purchase | Velocity checks | Flag users who checkout in < 3 seconds after admission |
CAPTCHA Integration
// queue-join-handler.ts
async function handleQueueJoin(req: Request): Promise<Response> {
const { saleId, captchaToken } = req.body;
const userId = req.user.id;
// Step 1: Verify CAPTCHA
const captchaValid = await verifyCaptcha(captchaToken);
if (!captchaValid) {
return new Response(JSON.stringify({
error: 'CAPTCHA_FAILED',
message: 'Please complete the CAPTCHA challenge'
}), { status: 403 });
}
// Step 2: Check device fingerprint
const fingerprint = req.headers.get('x-device-fingerprint');
const existingEntry = await valkey.get(`queue:device:${fingerprint}:${saleId}`);
if (existingEntry) {
return new Response(JSON.stringify({
error: 'DUPLICATE_DEVICE',
message: 'This device is already in the queue'
}), { status: 409 });
}
// Step 3: Rate limit by IP
const ip = req.headers.get('x-forwarded-for');
const ipCount = await valkey.incr(`queue:ip:${ip}:${saleId}`);
await valkey.expire(`queue:ip:${ip}:${saleId}`, 60);
if (ipCount > 5) {
return new Response(JSON.stringify({
error: 'RATE_LIMITED',
message: 'Too many queue join attempts from this IP'
}), { status: 429 });
}
// Step 4: Join queue
const result = await joinQueue(saleId, userId);
// Mark device as queued
await valkey.set(`queue:device:${fingerprint}:${saleId}`, userId, 'EX', 7200);
return new Response(JSON.stringify(result), { status: 200 });
}Coupon Abuse Detection
| Pattern | Detection | Response |
|---|---|---|
| Same user, multiple accounts | Device fingerprint + IP correlation | Flag accounts, require manual verification |
| Coupon code sharing | Track claim-to-redemption time (< 5s = suspicious) | Invalidate shared codes |
| Automated claiming | Request timing analysis (consistent sub-100ms intervals) | Temporary ban + CAPTCHA |
| Coupon farming | Multiple claims across campaigns from same device | Limit to 3 campaigns per device |
| Resale detection | Same shipping address across multiple user accounts | Flag for review |
Abuse Scoring Engine
// abuse-scorer.ts
interface AbuseSignals {
captchaScore: number; // 0.0 to 1.0 (Turnstile risk score)
deviceAge: number; // seconds since first seen
accountAge: number; // seconds since registration
requestInterval: number; // ms between consecutive requests
ipSharedAccounts: number; // number of accounts from same IP
claimVelocity: number; // coupons claimed per minute
fingerprintSharedAccounts: number;
}
function calculateAbuseScore(signals: AbuseSignals): number {
let score = 0;
if (signals.captchaScore < 0.3) score += 30;
if (signals.deviceAge < 300) score += 15; // device seen < 5 min ago
if (signals.accountAge < 86400) score += 20; // account < 1 day old
if (signals.requestInterval < 100) score += 25; // sub-100ms requests
if (signals.ipSharedAccounts > 3) score += 20; // many accounts from same IP
if (signals.claimVelocity > 2) score += 25; // > 2 coupons/min
if (signals.fingerprintSharedAccounts > 2) score += 30;
return Math.min(100, score);
}
// Score thresholds
// 0-30: Normal user, proceed
// 31-60: Suspicious, require additional CAPTCHA
// 61-80: High risk, delay queue admission by 30 seconds
// 81-100: Likely bot, block and log for reviewSecurity Headers and API Protection
| Protection | Implementation |
|---|---|
| API authentication | JWT with short expiry (15 min during sale) |
| Request signing | HMAC signature on checkout requests to prevent tampering |
| CORS | Strict origin whitelist (sale domain only) |
| Content Security Policy | Prevent XSS that could steal coupons |
| Rate limiting | Per-user, per-IP, and per-endpoint limits |
| Input validation | Strict schema validation on all API inputs |
| Idempotency keys | Required on checkout and coupon claim endpoints |
// Idempotency middleware
async function idempotencyMiddleware(req: Request, res: Response, next: Function) {
const idempotencyKey = req.headers.get('idempotency-key');
if (!idempotencyKey) {
return res.status(400).json({ error: 'Idempotency-Key header required' });
}
const cacheKey = `idempotency:${req.user.id}:${idempotencyKey}`;
const cached = await valkey.get(cacheKey);
if (cached) {
// Return cached response
const cachedResponse = JSON.parse(cached);
return res.status(cachedResponse.status).json(cachedResponse.body);
}
// Store a lock while processing
const locked = await valkey.set(cacheKey, 'processing', 'NX', 'EX', 300);
if (!locked) {
// Another request with the same key is in progress
return res.status(409).json({ error: 'Request already in progress' });
}
// Override res.json to cache the response
const originalJson = res.json.bind(res);
res.json = (body: any) => {
valkey.set(cacheKey, JSON.stringify({ status: res.statusCode, body }), 'EX', 300);
return originalJson(body);
};
next();
}Encryption
All client-to-server and inter-service traffic uses TLS 1.3. PostgreSQL data is encrypted at rest with AES-256-GCM. Valkey requires AUTH and TLS for all connections -- unencrypted Valkey traffic on the internal network is disabled to prevent sniffing of inventory state or coupon codes.
Payment data is encrypted with PCI-DSS-compliant key management via AWS KMS. Payment tokens are never stored in the application database -- only the payment gateway's tokenized reference (payment_id) is persisted. Coupon codes are hashed in logs using SHA-256 to prevent leakage through log aggregation systems.
PII Handling
User email and payment details are personally identifiable information (PII). These fields are stored only in PostgreSQL with column-level encryption. PII is never cached in Valkey (user identifiers in Valkey are UUIDs, not emails) and never included in log entries.
GDPR right-to-erasure: A cascade delete on user_id removes all associated records across the orders and coupon_claims tables. The deletion is logged for audit but the PII content is irrecoverable after deletion.
Data retention: Order and claim records are retained for 2 years to satisfy financial compliance requirements. After 2 years, a batch job purges records and associated PII. Anonymized aggregate data (sale totals, conversion rates) is retained indefinitely for analytics.
Testing and Validation
Security protects the system from external threats. Testing validates that the system behaves correctly under the specific conditions of a flash sale -- extreme concurrency, partial failures, and edge cases that only appear at scale.
Load Testing
k6 scripts simulate the full sale lifecycle: 10M virtual queue joins, 500K coupon claims, and 50K checkouts/min. Pass criteria: p99 checkout latency < 5s, 0 oversells, 0 double-claims. Load tests run T-48h before every sale using production-scale Valkey and PostgreSQL instances. The test environment mirrors production topology -- same number of Valkey shards, same PostgreSQL instance type, same Kafka partition count.
Load test scenarios:
| Scenario | Target | Pass Criteria |
|---|---|---|
| Queue flood | 10M joins in 60s | All joins acknowledged, queue depth = 10M |
| Sustained checkout | 50K orders/min for 30 min | p99 < 5s, 0 oversells |
| Coupon claim burst | 100K claims in 10s | 0 double-claims, pool count accurate |
| Doorbuster contention | 50K users targeting 1 SKU (50 units) | Exactly 50 sold, 0 oversells |
| Graceful degradation | Kill payment gateway mid-test | Queue pauses, in-flight sagas compensate |
Chaos Engineering
LitmusChaos scenarios run in the staging environment T-72h before the sale:
Kill Valkey primary during sale: Verify replica promotion completes in < 5 seconds. Admission controller pauses during the failover window. Post-promotion reconciliation confirms 0 oversells. The test fails if any inventory count diverges between Valkey and PostgreSQL.
Kill Temporal worker: Verify in-flight saga workflows pause (not fail). On worker restart, workflows resume from the last completed activity. No inventory or coupon leaks.
Partition Kafka broker: Simulate a network partition isolating one of three brokers. Verify no event loss after the partition heals. Consumer groups rebalance and resume from committed offsets.
Inject 5s payment latency: All payment gateway calls take 5 seconds. Verify the circuit breaker activates at the configured threshold. Users receive "payment pending" status. After latency returns to normal, the circuit breaker closes and checkout resumes.
Integration Tests
End-to-end sale simulation runs on every deploy to staging: create sale, join queue, get admitted, add to cart, apply coupon, checkout, then verify: inventory decremented in both Valkey and PostgreSQL, coupon claimed and marked as redeemed, order created with correct amounts, payment record exists. The test covers the happy path and two failure paths (payment decline, coupon already claimed).
Contract Tests
API schemas are validated against an OpenAPI spec on every pull request. Breaking changes (removing fields, changing types) fail CI. Kafka event schemas are validated via an Avro schema registry with backward compatibility mode -- consumers written against schema v1 must be able to read events produced with schema v2. Temporal workflow schemas are validated via replay tests -- workflow code is replayed against historical event logs to verify determinism.
Data Validation
Post-sale reconciliation (detailed in Observability) validates five invariants:
- Sum of sold inventory + remaining inventory = original total for every SKU
- Every claimed coupon has a corresponding coupon_claims record
- Every confirmed order has a payment record
- No user has more than one coupon claim per campaign (UNIQUE constraint violation count = 0)
- Kafka event count matches PostgreSQL order count (within the consumer lag window)
Any mismatch triggers a P1 alert and blocks the next sale until resolved.
Cost and Capacity
Testing validates correctness. The cost model below ensures the architecture is economically viable -- flash sale infrastructure is burst-oriented, and most cost accrues during the 1-4 hour sale window, not 24/7.
Per-Sale Cost Breakdown
| Component | Per-Sale (4h) | Monthly (daily sales) | 10x Scale |
|---|---|---|---|
| Valkey (6+6 nodes) | $50 | $1,500 | $8,000 (30+30 nodes) |
| PostgreSQL (db.r6g.2xl) | $30 | $900 | $5,000 (read replicas) |
| Kafka (3 brokers) | $20 | $600 | $3,000 (15 brokers) |
| Compute (API + workers) | $100 | $3,000 | $18,000 |
| CDN (CloudFront) | $200 | $6,000 | $30,000 |
| Temporal (3 nodes) | $40 | $1,200 | $6,000 |
| Total | ~$440/sale | ~$13,200/mo | ~$70,000/mo |
Cost Cliff
CDN is the largest cost driver because 10M concurrent users generate massive egress serving sale pages, images, and static assets. At 10x scale (100M users), CDN egress reaches $30K/month. Mitigation: aggressive caching (60s TTL), WebP images (30-50% smaller than JPEG), and edge-side includes for dynamic inventory count badges so the full page is not re-fetched on inventory changes.
Optimization
Pre-provision Valkey and compute for the sale duration only. Kubernetes HPA with scheduled scaling rules: scale up T-1h before the sale, maintain high replica count during the sale, scale down T+2h after the sale ends. Analytics consumers run on spot instances (can tolerate interruption without data loss because Kafka retains events for 7 days).
Valkey nodes can use reserved instances for predictable daily-sale patterns. For infrequent sales (weekly or monthly), on-demand pricing is more cost-effective despite the higher per-hour rate.
Hidden Cost
Payment gateway fees (2.9% + $0.30 per transaction) dwarf infrastructure costs. At 50K orders per sale with an average order value of $50, the gross merchandise value is $2.5M. Payment fees: $2.5M x 2.9% + 50K x $0.30 = $72,500 + $15,000 = $87,500 per sale. This is 200x the infrastructure cost. Optimizing infrastructure spend is important, but negotiating payment gateway rates has a far larger impact on unit economics.
Multi-Region Considerations
The cost model assumes single-region deployment. This section addresses what changes for global flash sales and why single-region is the deliberate default.
Single-Region by Design
The current design is single-region. This is deliberate: flash sales are time-bounded events (1-4 hours) where inventory must be globally consistent. Multi-region replication introduces the risk of overselling during network partition events. For a 2-hour sale with 100K inventory, a 5-second replication lag across regions could result in hundreds of oversold items.
Global Flash Sale Options
Option A: Regional inventory partitioning. Allocate inventory by region: 200 units to US, 200 to EU, 100 to APAC. Each region operates independently with a local Valkey cluster. Tradeoff: unsold inventory in one region cannot be dynamically reallocated to another without cross-region coordination, which adds 100-200ms latency and reintroduces the consistency risk. A doorbuster that sells out in US while EU has remaining units creates a poor customer experience.
Option B: Single-region inventory with global CDN. Keep all inventory atomics in one region (US-East). CDN serves sale pages globally from edge locations. Queue join and checkout requests route to US-East regardless of user location. Tradeoff: EU and APAC users experience 100-200ms additional latency on checkout -- acceptable for a 5s p99 SLO that is dominated by payment gateway latency.
Recommended: Option B for most flash sales (simpler architecture, no split-brain risk). Option A only for mega-sales exceeding 1M inventory items and 100M concurrent users, where single-region write throughput becomes the bottleneck.
Data Distribution
| Data | Strategy | Rationale |
|---|---|---|
| Sale config, product catalog | Replicate globally via CDN | Read-only during sale; staleness is acceptable |
| Queue | Global (single sorted set in primary region) | FIFO ordering requires single authority |
| Inventory | Global (single Valkey cluster in primary region) | Consistency requires single authority |
| Orders | Written in primary region, replicated async to regional read replicas | Read-after-write consistency in primary region; eventual consistency elsewhere |
Regional Failover
Active-passive architecture. If the primary region fails during a sale, the sale is paused -- not failed over. Rationale: inventory consistency is more important than availability for a flash sale. Users see a "Sale paused, please wait" message rather than risk overselling. The sale resumes when the primary region recovers.
DNS failover TTL is 60 seconds. But during active sales, the queue WebSocket connection detects disconnection instantly and displays the pause message client-side without waiting for DNS propagation.
Real-World Evolution: Three Things That Break at 10x
The multi-region analysis reveals that scale changes the problem, not just the numbers. Three specific break points emerge as the system grows beyond the initial design target.
1. Single Valkey Cluster Hits Throughput Ceiling (100M Users)
At 100x scale, the 16-slot inventory sharding strategy is insufficient. 100M concurrent users generate 250K ops/sec on inventory keys alone. A single Valkey cluster handles this on paper (200K+ ops/sec per node), but combined with coupon pool operations, queue management, and session state, the cluster is saturated.
The path forward: client-side caching with server-assisted invalidation, available in Valkey 7.2+. The server tracks which keys each client has cached and pushes invalidation messages when values change. For inventory counts, this means 99% of "is this item still available?" reads are served from the client process without a Valkey round-trip. Only the atomic decrement operations hit Valkey. Alternatively, a multi-cluster topology with regional inventory pools (Option A from Multi-Region) becomes necessary at this scale, accepting the complexity of cross-region reallocation.
2. Coupon Stacking Rule Evaluation Becomes a Bottleneck (50+ Coupon Types)
The current stacking validation iterates all coupon pairs to check compatibility -- O(n^2) where n is the number of applied coupons. At 2-3 coupons per order, this is trivial. But as the marketing team introduces new coupon types (referral codes, loyalty multipliers, category-specific discounts, seasonal stacks), the compatibility matrix grows quadratically. At 50+ coupon types, the stacking validation adds measurable latency to every checkout.
The path forward: a pre-computed compatibility matrix. At coupon campaign creation time, the system evaluates all pairwise compatibility rules and stores the result in a lookup table. At checkout, stacking validation becomes O(n) -- look up each pair in the matrix. For more complex rules (e.g., "maximum 3 discounts where total does not exceed 60% and no two are from the same category"), a constraint solver replaces the brute-force iteration.
3. Saga Orchestration Latency Grows with Payment Provider Diversity
Adding new payment methods (cryptocurrency, buy-now-pay-later, regional payment networks) adds new saga steps and new compensation paths. Each payment method has different timeout characteristics, different idempotency guarantees, and different failure modes. The Temporal workflow accumulates conditional branches, making it harder to test and reason about.
The path forward: workflow versioning and shadow-mode testing. New payment methods are added as versioned workflow variants. Shadow mode runs the new workflow in parallel with the existing one (using a test payment sandbox) on 1% of traffic. Compensation paths for each payment method are tested independently via Temporal replay tests against recorded event histories.
Secondary Evolution Items
Soft reservation TTL and background reclamation. Production systems add a TTL to reserved inventory (5-10 minutes). A background job scans for expired reservations and returns them to the available pool. Without this, inventory leaks on every sale at the rate of user abandonment.
Token-bound access control. The access token is bound to userId + deviceFingerprint + expiry with a cryptographic signature, validated on every subsequent request. Without binding, tokens can be shared or replayed.
Pre-warming and dry runs. Production systems never start a flash sale cold. At T-1 hour: warm Valkey keys, load coupon pools, warm CDN edge caches, pre-establish database connection pools, and run synthetic traffic through the full checkout path.
Kill switches and circuit breakers. When the payment gateway returns errors at 20%+ rate, the system stops admitting new users to checkout, allows browsing only, and queues pending orders for retry.
Runtime memory pressure under burst traffic. Garbage-collected runtimes (JVM, Go, Node.js) face sharp allocation rate spikes during flash sales. Production systems tune GC parameters for burst workloads and monitor GC pause duration alongside p99 latency.
VIP and priority queues. Real flash sales sometimes offer early access to premium subscribers. The virtual queue becomes a priority queue with multiple tiers, adding business value at the cost of fairness complexity.
Explore the Technologies
Core Technologies
| Technology | Role in This Design | Deep Dive |
|---|---|---|
| Valkey/Redis | Inventory atomics, coupon pools, queue management | Redis |
| PostgreSQL | Durable ledger for orders, coupon claims, reconciliation | PostgreSQL |
| Kafka | Event streaming for order events, analytics, async processing | Kafka |
| Prometheus | Metrics collection for sale-specific dashboards and alerting | Prometheus |
| Grafana | Dashboard visualization for real-time sale monitoring | Grafana |
| Kong | API gateway for per-route rate limiting, JWT auth, request routing | Kong |
Infrastructure Patterns
| Pattern | Relevance to This Design | Deep Dive |
|---|---|---|
| Message Queues and Event Streaming | Kafka for order events and async inventory sync | Event Streaming |
| Caching Strategies | CDN + Valkey multi-layer with inverted consistency model | Caching |
| Rate Limiting and Throttling | Per-user, per-IP, per-endpoint rate limits | Rate Limiting |
| Circuit Breaker and Resilience | Payment gateway circuit breaker, admission controller | Circuit Breaker |
| CDN and Edge Computing | CloudFront for sale page, waiting room, DDoS absorption | CDN |
| API Gateway | Kong for routing, auth, per-route rate limiting | API Gateway |
| Database Sharding | Valkey 16-slot inventory sharding, PG partition by sale_id | Sharding |
| Auto-Scaling Patterns | Kubernetes HPA for queue-driven and CPU-driven scaling | Auto-Scaling |
| WebSocket and Real-Time | Queue position updates via WebSocket push | WebSocket |
| Deployment Strategies | Canary 5% to full rollout with rollback triggers | Deployment |
| Alerting and On-Call | Sale-specific alert rules and escalation | Alerting |
| Metrics and Monitoring | 6 sale dashboards, 40+ specific alerts | Metrics |
External References
- Temporal -- Saga orchestration for checkout workflows
- Socket.io -- WebSocket library for real-time queue updates
- FingerprintJS -- Device fingerprinting for bot detection
- hCaptcha -- CAPTCHA for queue entry bot prevention
- k6 -- Load testing for simulating 10M concurrent users
A flash sale system is not an inventory system. It is a system that controls contention. Every component exists for one reason: to decide who gets access, and who does not, under extreme pressure. The queue decides who enters. Valkey decides who succeeds. Temporal decides who gets compensated. The rest is implementation detail.