System Design: Ad Exchange (Real-Time Bidding, Sub-100ms Auctions, DSP/SSP, Impression Serving)
Goal: An ad exchange that runs real-time bidding auctions for one million ad requests per second, picks the top 5 of 50+ registered DSPs per request, finishes a first-price auction inside 100ms, enforces publisher floor prices and supply-chain rules (ads.txt, sellers.json, schain), serves winning ad creatives through a CDN, tracks impressions and clicks on its own (rather than trusting DSP-reported numbers), and reconciles billions of dollars of ad spend each month between DSPs and publishers. At about $4B/month and 300K impressions/sec, that works out to roughly 25 billion impressions a day across three regional clusters in us-east, eu-west, and ap-south.
Reading guide: §1 walks through one ad serving end-to-end, with all the actors involved. §2–§4 cover the problem and requirements. §5 introduces the ecosystem and the components inside the exchange. §6 and §7 cover architecture and sizing. §8 and §9 cover the data model and APIs. §10 is the deep-dive section: auction flow, DSP selection, bid optimization, ad serving, spend tracking, fraud. §11–§15 cover bottlenecks, failures, deployment, observability, and security.
TL;DR: A user in Texas abandons a $130 pair of Nike running shoes in their cart on Saturday night. Sunday morning they open ESPN on their phone, and inside about a tenth of a second a Nike ad for those exact shoes shows up in the page. Pulling that off involves a CDP, an identity graph, a campaign manager, a DSP that ran an ML bidder, a header-bidding wrapper running in the browser, four SSPs, an exchange, a publisher ad server, a CDN, a verification vendor, and an attribution stack. The exchange is the marketplace that runs the auction. It owns no campaigns, no budgets, no creatives; those all live in the DSPs. Its job is taking supply from SSPs, picking the right DSPs to ask, running a fair first-price auction, returning markup, tracking impressions on its own books, and reconciling settlement at the end of the day. At a million requests per second the things that actually matter are: don't fan out to all 50 DSPs (pick the top 5), don't put a network hop in front of every enrichment lookup (cache in-process), don't log every losing bid to Kafka (sample), and never trust a DSP's own impression count for billing.
1. How One Ad Actually Gets Served
A worked example is easier than another diagram. Imagine someone in Austin who looked at running shoes on nike.com on Saturday night. They added a $130 Nike Air Zoom Pegasus 41 to their cart, got distracted, closed the tab, and didn't think about it again. On Sunday morning they're scrolling ESPN on their phone, tap into a Cowboys-Giants game recap, and a banner ad in the middle of the article shows the exact shoes from last night. They tap, the page goes back to nike.com, and they buy.
Fourteen different companies are involved in the roughly tenth of a second between the page starting to load and that banner becoming visible. The rest of this post designs one of them, the ad exchange. The walkthrough below exists so it's clear what the other thirteen are doing.
The companies involved
On the advertiser side, before the auction even happens: Nike is the advertiser, and Nike's media agency (WPP) actually runs the day-to-day buying. The agency uses Google Campaign Manager 360 to store the creatives, the budget, the flight dates, and the Floodlight tracking pixels for conversion attribution. Nike's website fires events into Segment, a CDP, which captures things like "added to cart" and builds an audience from them. Segment forwards those audiences to The Trade Desk, which is the DSP that will actually bid in the auction. LiveRamp sits alongside as the identity graph: it ties the user's hashed email from a previous nike.com login to a stable RampID that links their laptop cookie to their iPhone IDFA, which is how the cart-abandon signal from the laptop ends up matched to the iPhone session on ESPN.
In the runtime path on the publisher side: ESPN owns the page and the 300×250 ad slot. ESPN's page has Prebid.js (a header-bidding wrapper) configured against four SSPs (Magnite, PubMatic, Index Exchange, OpenX) and a Google Publisher Tag pointing at Google Ad Manager, which is ESPN's ad server. When the page loads, Prebid.js runs in the browser and asks all four SSPs in parallel. The SSPs each forward into one or more exchanges, including Google AdX, which is the system this post is designing. AdX picks the top 5 of its registered DSPs and runs the auction.
After the impression renders: CloudFront serves the banner image from an Austin edge POP. Integral Ad Science fires a viewability beacon from inside the creative to confirm the ad was actually visible for at least a second. When the click happens, Floodlight (part of CM360) and Google Analytics 4 fire on the Nike order-confirmation page to attribute the purchase back to the Trade Desk click on ESPN.
That's the cast: Nike, WPP, CM360, Segment, LiveRamp, The Trade Desk, Magnite (plus PubMatic, Index, OpenX), Prebid.js, Google Ad Manager, Google AdX, ESPN, CloudFront, IAS, GA4 + Floodlight. Some of these are the same company in different roles (Google in particular shows up four times), but the products are distinct.
How the request actually flows
The night before
- Segment's JavaScript on nike.com sees the cart-add and the eventual session timeout, and writes the user into the
cart_abandoners_shoes_7daudience. - Every fifteen minutes Segment reverse-ETLs new audience members into The Trade Desk. By the time the user goes to bed, TTD already knows about them.
- LiveRamp attaches a RampID to the user's hashed email so the laptop cookie and the iPhone IDFA resolve to the same identity the next morning.
Sunday morning, the page loads
- The browser requests
espn.com/nfl/cowboys-giants-recap. ESPN returns HTML and JavaScript. - Prebid.js initializes and identifies the 300×250 slot in the article body:
<article>Cowboys quarterback Dak Prescott threw for 301 yards...</article>
<div id="div-gpt-ad-midarticle"></div>
<article>In the second half, the Giants...</article>
<script>
pbjs.addAdUnits([{
code: 'div-gpt-ad-midarticle',
mediaTypes: { banner: { sizes: [[300, 250]] } },
bids: [
{ bidder: 'magnite', params: { accountId: 'espn-001' } },
{ bidder: 'pubmatic', params: { publisherId: 'espn-002' } },
{ bidder: 'ix', params: { siteId: 'espn-003' } },
{ bidder: 'openx', params: { unit: 'espn-004' } }
]
}]);
</script>- Prebid fans out to all four SSPs in parallel.
- Each SSP enforces ESPN's $3 sports-vertical floor, attaches an
schainobject, and forwards into one or more exchanges. Magnite forwards into AdX.
Inside AdX, the auction
- Enrichment: an in-process LRU lookup returns consent state, a cookie-sync map (
{ttd: ttd_abc, dv360: dv_xyz, amzn: amzn_123}), and the user's RampID. Cache hit, no network. - Pre-bid fraud filter: IP reputation, ASN check, user-agent sanity. All clean.
- Smart DSP selection: 50 DSPs are registered but only five get asked. AdX scores each one on geo, format, vertical, win rate, and capacity, and picks TTD, DV360, Amazon DSP, Criteo, and Xandr.
- Fan-out: bid requests go out in parallel over HTTP/2 with a 60ms timeout.
TTD's bid (the winner)
- Audience match: user is in
cart_abandoners_shoes_7d. - Campaign match: Pegasus 41 Retargeting is eligible.
- Frequency cap: 0/3 today, allowed.
- ML model: pCTR 8% (very high — fresh cart abandoner looking at the same product), pCVR 12%, expected value ~$1.25 per impression, maximum CPM well over $1,000.
- Shading model: ESPN sports impressions usually clear around $8, so TTD bids $8.50.
- Response carries the price plus ready-to-render HTML for the Pegasus banner with a Floodlight pixel embedded.
The other four DSPs
- DV360: $5.00 (generic shoe retargeting).
- Criteo: $6.50 (also tracks nike.com).
- Amazon DSP: $4.00 (Nike sells on Amazon).
- Xandr: 204 no-bid.
Resolution and render
- AdX picks TTD at $8.50, substitutes the
${AUCTION_PRICE}macro with an encrypted price token, validates the creative against ESPN's blocked-categories list, and returns the markup to Magnite. - Prebid compares all four SSPs (PubMatic $7.00, Index $5.00, OpenX $3.50) and picks Magnite as the overall winner.
- GAM checks direct deals: Ford has a homepage sponsorship that doesn't apply to NFL articles, Progressive's guaranteed deal is desktop-only. Prebid's $8.50 beats the line-item stack.
- The browser fetches the banner from CloudFront's Austin POP (~12ms cache hit). IAS's
IntersectionObserverbeacon starts watching the slot. - Ad becomes visible roughly 110ms after the page started rendering.
After the impression
- Exchange impression pixel, Floodlight pixel, and IAS viewability beacon all fire. IAS confirms 60% of pixels in view for over a second, counts it as viewable per MRC.
- The user taps the banner two seconds later. The click hits the exchange's
/t/clickendpoint, gets logged to Kafka, and 302-redirects tonike.com/pegasus-41?utm_source=ttd&utm_medium=retargeting. - The cart is still there. The user buys.
- The order-confirmation page fires the Floodlight conversion pixel. CM360 attributes the $130 purchase back to the TTD click on ESPN.
Where the $8.50 actually goes
Of the $8.50 the advertiser pays for that impression, only about $5.30 makes it to ESPN. The rest is split among the intermediaries: roughly $1.00 to TTD as the DSP fee, $0.15 to LiveRamp for the identity match, $0.80 to AdX as the exchange take rate, $1.15 to Magnite as the SSP fee, and around $0.10 to IAS for the viewability measurement. CloudFront's bandwidth cost is rounded into ESPN's hosting bill and is essentially free per impression.
| Actor | Cut |
|---|---|
| Nike (advertiser, gross) | $8.50 |
| The Trade Desk (DSP) | $1.00 |
| LiveRamp (identity match) | $0.15 |
| Google AdX (exchange) | $0.80 |
| Magnite (SSP) | $1.15 |
| IAS (verification) | $0.10 |
| ESPN (publisher net) | $5.30 |
Roughly 37% of the gross goes to ad-tech middlemen. That's the number publishers point at when they argue for Supply Path Optimization (collapsing redundant SSP and exchange hops to recover more of the dollar). Not counted here: WPP's agency commission (separate, around 10–15% of media spend), the Segment SaaS subscription, the CM360 license, and any flat fees baked into the relationships. Those are negotiated outside the auction.
A few things people get wrong
Most confusion in this ecosystem comes from product names that sound similar. Google Ads is the small-business-facing DSP interface, while DV360 is the enterprise DSP, AdX is the exchange, GAM is the publisher ad server, and CM360 is the campaign manager. Five different products under the same brand. Inside that pile, AdX is the one running auctions and the one this post is about.
People sometimes think the SSP runs the auction. It doesn't. The SSP packages publisher inventory and forwards to the exchange. To make matters worse, GAM is both an SSP and an ad server, and some SSPs (Magnite is the obvious one) run their own internal auctions before forwarding to a downstream exchange.
The exchange is also not where campaigns live. Campaigns, budgets, creatives, frequency caps, pacing, ML bid optimization: all of that lives inside DSPs. The exchange only sees bids and no-bids.
Two more. Prebid.js is not an SSP, it's a header-bidding wrapper that runs in the browser and calls multiple SSPs in parallel. And a CDP (Segment, mParticle) is not the same thing as a DMP or identity graph (LiveRamp). The CDP captures first-party events with known identifiers; the identity graph resolves identity across devices and historically held third-party segments.
With that out of the way, the rest of the post designs the exchange itself.
2. Problem Statement
Online advertising is what pays for most of the open web. Every time a page or app loads, an auction has to happen in the background and resolve before the content finishes rendering. The wall-clock budget is around 100ms from when the ad request leaves the publisher to when winning markup comes back. Miss it and the slot stays empty: the publisher loses revenue, the advertiser loses reach.
The exchange sits in the middle of that auction. It receives supply from SSPs, picks DSPs to ask, runs a first-price auction, returns the winner, tracks the impression independently, and settles money at the end of the day. None of that is novel to describe. What makes it hard is the combination of latency, fan-out, and the fact that every auction is settling real money.
The latency constraint is the dominant one. A naive design would fan every bid request out to every registered DSP. With 50 DSPs and 1KB requests, that's 50KB of outbound traffic per auction multiplied by a million auctions per second, or 50 GB/sec leaving the exchange, and you're waiting on the slowest DSP every single time. The realistic answer is to score each DSP per request on geo, format, vertical, win rate, and capacity, and only fan out to the five most likely to bid competitively. Section 10.2 covers the scoring in detail; the point here is that the right baseline is "top 5 of 50," not "all 50."
Scale matters even after that fan-out trim. A serious exchange runs around 1M ad requests per second sustained, peaking near 2M during US evening prime time. With the top-5 selection that's 5M outbound bid requests per second. Around 100 auction-server pods spread across three regions, every component has to scale horizontally, and every hot-path lookup has to be sub-millisecond.
Every auction is also money. An auction bug that lets a $0.10 bid beat a $2.00 bid leaks $1.90 per impression; at 300K impressions per second that's $570/sec, or $50M a day. Auction integrity is the core product guarantee. Winning bids and impressions are logged at 100%, losing bids are sampled, and DSP-reported impressions are reconciled against the exchange's own tracking nightly so any drift gets caught.
Underneath all of that sit the long-running concerns: bot traffic, domain spoofing, headless browsers, click farms, datacenter-hosted "users." Ads.txt and sellers.json exist specifically to stop domain spoofing. GDPR and CCPA limit what can cross a wire without consent. Every request has to be fraud-filtered and PII-scrubbed in under 2ms combined. None of these are theoretical. A single GDPR violation can run 4% of global revenue.
Quick numbers to anchor the rest of the post:
| Metric | Target |
|---|---|
| Ad requests per second (sustained) | 1,000,000 |
| Ad requests per second (peak) | 2,000,000 |
| Impressions per second (30% fill) | ~300,000 |
| Impressions per day | ~25 billion |
| DSPs registered | 50+ |
| DSPs per auction (top-N) | 5 |
| Auction latency p99 | < 100 ms |
| DSP response timeout | 60 ms |
| Monthly spend through exchange | ~$4 billion |
| Exchange take rate | 10–15% |
A few things to avoid: don't fan out to every DSP, don't put Valkey in the hot path for every enrichment lookup (front it with an in-process cache), don't log every losing bid to Kafka (sample them), don't call DSPs sequentially, and don't trust DSP-reported impression counts for anything that touches money.
3. Functional Requirements
| ID | Requirement | Priority |
|---|---|---|
| FR-01 | Accept bid requests from SSPs via OpenRTB 2.6 and run first-price sealed-bid auctions in < 100 ms p99 | P0 |
| FR-02 | Smart-select top 5 eligible DSPs per auction and fan out in parallel with 60 ms timeout | P0 |
| FR-03 | Enforce publisher floor prices and block-list (categories/advertisers) per publisher config | P0 |
| FR-04 | Return winning ad markup (HTML for banner, VAST XML for video) to the SSP within budget | P0 |
| FR-05 | Serve ad creatives through CDN edge nodes with cache headers for efficient delivery | P0 |
| FR-06 | Track impressions via server-side pixel (1×1 GIF) with deduplication | P0 |
| FR-07 | Track clicks via redirect URL with destination validation | P0 |
| FR-08 | Enforce exchange-level creative dedup (max N impressions of same creative per user per hour) as an ad-quality measure. Per-campaign frequency capping is a DSP responsibility. | P1 |
| FR-09 | Track per-DSP spend in near-real-time via Flink streaming for credit-limit enforcement and settlement | P0 |
| FR-10 | Validate supply chain: ads.txt, sellers.json, and schain object on every request | P0 |
| FR-11 | Check consent (TCF / US Privacy string) and strip PII from bid requests when required | P0 |
| FR-12 | Publish tracking events (impressions, clicks, viewability, auction results) to Kafka for billing and analytics | P0 |
| FR-13 | Reconcile exchange-tracked impressions with DSP-reported impressions daily; flag discrepancies > 0.01% | P1 |
| FR-14 | Provide publisher and DSP management APIs (floor prices, DSP onboarding, settlement reports) | P1 |
| FR-15 | Support banner, video (VAST 4.2), and native ad formats | P0 |
| FR-16 | Pre-bid fraud filtering: IP reputation, user-agent signature, datacenter detection, ASN reputation | P0 |
4. Non-Functional Requirements
| Dimension | Target |
|---|---|
| Auction latency (p50) | < 50 ms |
| Auction latency (p99) | < 100 ms |
| Fill rate | > 30% (varies by publisher and market) |
| Availability | 99.95% (4.4 hours/year planned + unplanned downtime) |
| Tracking pipeline loss | < 0.01% event loss end-to-end |
| Billing accuracy (reconciled) | ±0.01% of DSP-reported impressions |
| CDN cache hit rate | > 95% |
| DSP connection pool warm starts | All DSPs kept warm via periodic health pings |
| Multi-region failover | < 60 seconds (DNS-based geo failover) |
| Deployment rollback | < 5 minutes for any component |
5. High-Level Approach & Technology Selection
5.1 The full ecosystem
The walkthrough in §1 named most of the actors. The map below is the same cast in table form, useful as a reference when later sections refer to a specific role.
| Layer | Role | Examples |
|---|---|---|
| Advertiser | Pays for ads. Sets goals, budgets, targeting. | Nike, P&G, a local dentist |
| Agency | Runs media buying on behalf of advertisers. Contracts with DSPs. | WPP/GroupM, Publicis, Omnicom |
| Campaign Manager | Stores creatives, flight dates, budget rules, attribution tags. Publishes campaigns to DSPs. | Google Campaign Manager 360 (CM360), Adobe Advertising |
| CDP | Captures first-party events from advertiser sites. Builds audiences. Syncs to DSPs. | Segment, mParticle, Treasure Data |
| Identity graph / DMP | Resolves user identity across devices and cookies. Provides stable cross-device IDs. | LiveRamp (RampID), Neustar Fabrick, ID5 |
| DSP | Receives bid requests from exchanges. Runs bid optimization ML. Decides to bid and at what price. Owns campaign budgets and frequency caps. | The Trade Desk, DV360, Amazon DSP, Criteo, Xandr |
| Ad Exchange | This system. Runs the auction. Receives supply from SSPs, selects and fans out to DSPs, picks a winner. | Google AdX, OpenX, Index Exchange, PubMatic, Magnite |
| SSP | Packages publisher inventory. Enforces floor prices, brand safety rules. Forwards bid requests to exchanges and direct DSPs. | Magnite, PubMatic, Index Exchange, OpenX, Xandr Monetize |
| Header bidder | Client-side JavaScript that calls multiple SSPs in parallel from the browser, then picks the highest bid. | Prebid.js, Amazon TAM |
| Publisher ad server | Owns the ad slot. Decides between direct deals, guaranteed deals, and programmatic (Prebid) bids. | Google Ad Manager (GAM), Kevel, FreeWheel |
| Publisher | Owns the website or app. Gets paid per impression. | ESPN, CNN, NYT, mobile game developers |
| Verification | Measures viewability, brand safety, invalid traffic. Runs JavaScript beacons. | Integral Ad Science (IAS), DoubleVerify, MOAT |
| Attribution & analytics | Tracks conversions. Attributes them to impressions and clicks. | Google Analytics 4, Floodlight (CM360), Adjust (mobile), AppsFlyer (mobile) |
| CDN | Serves creative assets from edge POPs. | CloudFront, Fastly, Akamai, Cloudflare |
The runtime relationships look like this:
The exchange makes its money by charging a take rate (usually 10–15%) on each cleared auction. If a DSP bids $8.50 CPM and wins, the exchange keeps about $0.80 and the rest goes to the SSP and publisher. Billing happens monthly against exchange-tracked impressions, reconciled with DSP-reported numbers; anything more than 0.01% off gets investigated.
Boundary the rest of this post uses: campaigns, budgets, creatives, bid optimization, frequency capping, and conversion attribution all live inside DSPs. The exchange is a stateless marketplace that sees only bid requests, bids, wins, impressions, and clicks.
5.2 First-price auctions
The industry shifted from second-price to first-price auctions around 2017–2019. In a first-price auction the winner pays exactly what they bid. That's simpler for the exchange and more transparent for the DSP, with one tradeoff: DSPs now have to bid below their true valuation (bid shading) to avoid systematically overpaying, since they no longer get the safety of paying only the second-highest price. Bid shading lives entirely inside the DSP, so the exchange doesn't need to know about it.
The trigger for the move was a trust problem. In second-price exchanges that handled both supply and demand, some operators were caught using their knowledge of all bids to give favored buyers a "last look" at winning prices. First-price ends that ambiguity because there's nothing to manipulate; the winner pays what the winner said.
| Second-price (legacy) | First-price (current) | |
|---|---|---|
| Winner pays | Second-highest + $0.01 | Their own bid |
| DSP strategy | Bid truthfully | Bid shade (0.5–0.85 × true value) |
| Exchange complexity | Higher (track top-2 bids) | Lower (track max bid) |
| Transparency | Low (exchanges could manipulate) | High |
| 2024+ adoption | Declining | Dominant |
5.3 OpenRTB 2.6
Every DSP and SSP speaks OpenRTB, currently version 2.6, and every exchange has to as well. The protocol defines the wire format for bid requests, bid responses, win notices, and the supporting objects (Site, App, User, Device, Imp, Bid). Sections 9.1 and 9.2 show real payloads. The one detail worth flagging here: the adm field in a bid response carries actual render-ready markup (HTML for banner ads, VAST XML for video), not just a URL. The DSP is responsible for providing markup the browser can execute; the exchange substitutes a few macros (auction price, click URL, impression URL) before forwarding.
| Object | Purpose | Key Fields |
|---|---|---|
BidRequest | Top-level request from exchange to DSP | id, imp[], site/app, user, device, regs, tmax |
Imp | One ad slot | id, banner/video/native, bidfloor, pmp |
Site/App | Publisher context | domain, page, cat[], publisher |
User | User targeting | id, buyeruid, geo, data[], consent |
Device | Device info | ua, ip, geo, devicetype, os |
BidResponse | DSP's response | id, seatbid[], cur |
Bid | Individual bid | id, impid, price, adm (creative markup), crid, adomain[] |
5.4 The components inside the exchange
Eight services, most of them stateless. The auction server is the hot path: it accepts OpenRTB requests from SSPs, enriches them, runs DSP selection, fans out, runs the auction, and returns the winning markup. Written in Go, with an in-process LRU cache fronting everything that would otherwise need a network lookup. The DSP selection logic isn't a separate service, it's a library inside the auction server.
The ad server handles macro substitution on the winning markup and assembles VAST XML for video. The tracking endpoint is a separate pod group that takes impression pixels, click redirects, and viewability beacons; it deduplicates against Valkey, produces to Kafka asynchronously, and returns the pixel as fast as possible. The creative dedup service is a small Valkey-backed counter that prevents the same creative ID from showing to the same user more than N times an hour. That's an ad-quality measure, not per-campaign frequency capping; per-campaign caps live in DSPs.
For billing, a Flink job consumes the impressions topic keyed by DSP ID and writes per-DSP running spend totals to Valkey, where the auction servers can read them for credit-limit enforcement. The same Flink job also computes rolling win rates that feed DSP selection. A daily batch job aggregates ClickHouse data per DSP and per publisher, reconciles against DSP-reported numbers, and writes settlement records. The management API is the boring part: CRUD for publisher configs, DSP onboarding, settlement queries.
5.5 Storage
| Store | Technology | Rationale |
|---|---|---|
| Enrichment hot-path cache | In-process LRU (ristretto) | 5-second TTL. Serves > 95% of enrichment reads with zero network hops. |
| Enrichment cold-path | Valkey Cluster | Sub-ms reads on cache miss. Sharded by user ID. Background-synced from PostgreSQL + event streams. |
| Auction event log (sampled) | Kafka | Durable event stream. 100% of impressions and winning bids; 1% sample of losing bids. |
| Real-time analytics & billing | ClickHouse | Columnar analytics on billions of rows. Sub-second aggregation for dashboards. |
| Exchange configuration | PostgreSQL | DSP configs, publisher settings, SSP registrations, settlement records. Read replicas for auction servers. |
| Bid-level archive | S3 / Iceberg (Parquet) | Long-term storage of winning bids and sampled losses. For billing disputes and ML training. |
| Creative assets | S3 + CloudFront | Originless serving via CDN with > 95% cache hit rate. |
5.6 Why Go
Go is a defensible choice for this rather than a load-bearing one. Goroutines make the parallel fan-out pattern trivial: each DSP call is a goroutine, the timeout is a context, and the auction starts as soon as the last bid comes in or the deadline fires. GC pauses with modern Go (1.22+) are well under a millisecond, which matters for tail latency. The standard library has a good HTTP/2 client with built-in connection pooling and multiplexing, so there's no third-party dependency for the most performance-sensitive piece.
Rust would give marginally better tail latency and zero GC, but at a real cost in development speed and the ability to staff the team. Java with Netty and virtual threads is the other reasonable answer; it's slightly harder to keep G1GC pauses under the auction budget but plenty of large exchanges run on the JVM. The team's existing Go skills are usually the deciding factor.
5.7 Why ClickHouse
At 25 billion impressions a day, the dashboard queries are aggregations over 100+ billion rows ("spend by DSP by publisher by hour for the last 7 days," that kind of thing). ClickHouse handles that in single-digit seconds. Druid is the other serious option but adds operational complexity. BigQuery works but the per-query costs add up fast at this scale, and Postgres simply can't move enough rows. ClickHouse also has a native Kafka table engine, so ingestion is essentially zero-code: a MergeTree table materializes from a Kafka topic automatically.
6. High-Level Architecture
6.1 Multi-region bird's eye
The exchange runs in three regions: us-east-1, eu-west-1, and ap-south-1. Geo-DNS or Anycast routes each SSP request to the nearest region. Inside a region everything is stateless or regionally-sharded. Cross-region state (DSP configs, publisher settings, billing records) lives in PostgreSQL with logical replication out from a single primary in us-east-1.
6.2 Decisions worth flagging
Auction servers are stateless. They cache config (DSP endpoints, publisher settings) refreshed every 30 seconds, plus an in-process LRU of enrichment data refreshed asynchronously. No durable local state. Any pod can serve any request, so horizontal scaling is just adding pods.
Load balancing is L4. TLS termination at the load balancer is too expensive at a million QPS, so the L4 balancer hands TCP connections to auction servers via consistent hashing and TLS terminates inside the auction server, parallelized across cores.
The in-process LRU is the cache layer that does most of the work. Hit rate is well above 95% during peak traffic because the same users show up in many simultaneous auctions, and a 5-second TTL is short enough that staleness isn't a real concern. Cache misses fall through to Valkey. A background worker keeps hot keys warm. Without this layer, Valkey would need to handle 5M ops/sec just for enrichment and the wire latency would dominate the auction budget.
Fan-out happens inside the auction server itself, not in a separate service. Every pod maintains persistent HTTP/2 connection pools to every registered DSP, which saves a network hop versus a separate fan-out tier and gives tighter control over per-DSP timeouts and circuit breakers.
Kafka is the universal event bus. Winning bids, impressions, clicks, viewability beacons, DSP config updates: they all flow through Kafka topics. The auction server produces asynchronously and never waits for ack on the hot path.
Sampled bid logging keeps Kafka manageable. Winning bids and impressions go in at 100%. Losing bids go in at 1%. The full bid stream tees out separately to S3/Iceberg through Kafka Connect, which is cheap durable storage for billing disputes and ML training without burning hot Kafka capacity.
CDN-first creative delivery: creatives live on S3 and get pushed to CloudFront. The auction server never touches creative bytes; it returns markup pointing at a CDN URL.
6.3 Auction flow, happy path
6.4 Auction flow, timeout
When the 60ms DSP timeout fires, the auction server runs the auction with whatever bids have arrived. Zero bids above floor means a no-fill response back to the SSP. Slow DSPs feed into a per-DSP circuit breaker, covered in §12.2.
7. Back-of-the-Envelope Sizing
Every number here is rounded so you can redo the math in your head if you want.
7.1 Request volume
Sustained: 1,000,000 QPS
Peak: 2,000,000 QPS (US evening prime time)
Design for: 1,500,000 QPS with headroom
Per day: 1M × 86,400 ≈ 86 billion bid requests/day
Fill rate: 30%
Impressions: 86B × 0.30 ≈ 26 billion/day
≈ 300,000 impressions/sec
7.2 DSP fan-out
Naive (fan out to all 50 DSPs): 1M × 50 = 50M bid requests/sec
Top-5 smart selection: 1M × 5 = 5M bid requests/sec
Bid request: ~1 KB (OpenRTB JSON, gzipped on the wire)
Bid response: ~0.5 KB
Outbound: 5M × 1 KB = 5 GB/sec
Inbound: 5M × 0.5 KB = 2.5 GB/sec
Total: ~7.5 GB/sec across all auction servers
The top-5 selection is what makes the bandwidth (and the per-DSP cost) tractable. Without it the exchange is wasting an order of magnitude on requests no DSP would have bid on anyway. §10.2 has the scoring algorithm.
7.3 Auction server sizing
Per-auction latency budget:
LRU cache hit (95%): 0.1 ms
Valkey cold (5%): 1.0 ms (amortized 0.05 ms)
Pre-bid filter: 1.0 ms
DSP selection: 1.0 ms
DSP fan-out (parallel, 60 ms timeout): 40 ms avg
Auction logic: 0.5 ms
Macro sub + response: 1.0 ms
Total p50: ~45 ms
Total p99: ~80 ms
Per pod (c6g.4xlarge, 16 vCPU, 32 GB RAM):
Concurrent in-flight auctions: ~2,000
QPS per pod: ~25,000
Pods needed:
Sustained 1M / 25K = 40 pods
Peak 2M / 25K = 80 pods
Deploy 100 pods across 3 regions (40 US + 30 EU + 30 APAC) with HPA to 2x
The latency budget breaks down something like this: an LRU hit takes a fraction of a millisecond, the rare Valkey cold path adds maybe another, pre-bid filtering and DSP selection together cost 2ms, the parallel DSP fan-out is 40ms on average and 60ms in the worst case, and the auction logic plus serialization is the rest. p50 lands around 45ms; p99 under 80ms.
7.4 Cache and Valkey
Enrichment lookups per auction: 3 logical keys
- user consent + cookie-sync (1 hash)
- IP/UA fraud flags (1 set membership)
- DSP credit-limit flags (1 hash)
At 1M QPS:
LRU hits (95%): 2.85M logical lookups/sec in-process, zero network
Valkey cold (5%): 150K ops/sec, trivial for a Valkey cluster
Valkey working set:
Active users (30-day): ~200 million
Per-user entry: ~150 bytes (consent + cookie sync)
Total users: 200M × 150 = 30 GB
Fraud lists (IPs + UAs): ~1 GB
DSP credit state: negligible
Creative dedup counters: ~10 GB
Total: ~42 GB
Valkey cluster: 3 primaries (16 GB each) + 3 replicas = 6 nodes per region.
7.5 Kafka (with sampling)
Event topics:
impressions: 300K/sec × 500 bytes = 150 MB/sec
clicks: 3K/sec × 300 bytes = 1 MB/sec
viewability: 300K/sec × 200 bytes = 60 MB/sec
winning_bids: 300K/sec × 800 bytes = 240 MB/sec
losing_bids (1%): 50K/sec × 800 bytes = 40 MB/sec
Total: ~490 MB/sec
× replication 3 = 1.5 GB/sec write throughput
Per day:
490 MB/sec × 86,400 = 42 TB/day ingested
× zstd 4x compression ≈ 10 TB/day on disk
Hot retention 3 days = 30 TB
Kafka cluster (per region): 10 brokers × 4 TB NVMe = 40 TB
~500 MB/sec ingest per region fits at <30% capacity.
The unsampled bid stream gets teed directly to S3/Iceberg through Kafka Connect, so durable long-term storage doesn't sit on hot Kafka brokers.
7.6 ClickHouse
Ingest rate:
impressions: 300K rows/sec
clicks: 3K rows/sec
winning_bids: 300K rows/sec (separate table)
Total: ~600K rows/sec
Row sizes (after compression):
impression row: ~60 bytes compressed
Per day: 26B × 60 = 1.5 TB/day compressed
90-day hot: 135 TB
Cluster (per region):
4 shards × 3 replicas = 12 nodes
Each: r6g.4xlarge, 4 TB NVMe
Total: 48 TB per region
TTL moves > 30-day data to S3 tiered storage.
7.7 CDN
300K impressions/sec × 200 KB avg creative = 60 GB/sec egress
CDN cache hit rate > 95% → origin pulls < 3 GB/sec
Daily egress: 60 GB/sec × 86,400 ≈ 5 PB/day
Unique creatives: ~500K (top 1% serve 60% of requests, heavy head and long tail)
Creative total storage on origin: 500K × 200 KB = 100 GB
CDN POP cache: ~10 GB hot working set per POP
7.8 Summary
| Resource | Number |
|---|---|
| Auction server pods (global) | 100 |
| Valkey nodes (global, 3 × 6) | 18 |
| Kafka brokers (global, 3 × 10) | 30 |
| ClickHouse nodes (global, 3 × 12) | 36 |
| Outbound DSP bandwidth | ~7.5 GB/sec |
| CDN egress | ~60 GB/sec |
| Monthly AWS + CDN bill (rough) | $8–12M |
| Monthly revenue at 10% take rate | ~$400M |
A reasonable check on the economics: at $4B/month gross spend through the exchange and a 10% take rate, that's about $400M/month in revenue against $8–12M/month in infrastructure. Around 3% of revenue going to compute and bandwidth is what makes the business work. Smart DSP selection, in-process caching, and Kafka sampling are the three things that keep the cost line that low.
8. Data Model
8.1 Auction state machine
8.2 Core tables (PostgreSQL)
The exchange stores publisher settings, DSP configs, SSP registrations, and billing records. Campaigns, budgets, creatives, and frequency caps don't appear here; those live in DSPs.
CREATE TABLE dsp_configurations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
dsp_name VARCHAR(100) NOT NULL UNIQUE,
bid_endpoint TEXT NOT NULL,
win_notice_endpoint TEXT,
max_qps INT NOT NULL DEFAULT 100000,
timeout_ms INT NOT NULL DEFAULT 60,
allowed_categories TEXT[],
allowed_geos TEXT[],
allowed_formats TEXT[],
seat_id VARCHAR(50),
circuit_breaker JSONB NOT NULL DEFAULT '{"err_threshold": 0.5, "timeout_threshold": 0.3, "window_sec": 60, "cooldown_sec": 30}',
historical_win_rate DECIMAL(5,4) DEFAULT 0,
enabled BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE publishers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
domain VARCHAR(255) NOT NULL UNIQUE,
ssp_id UUID REFERENCES ssp_configurations(id),
floor_price_cents INT NOT NULL DEFAULT 50,
blocked_categories TEXT[],
blocked_advertisers TEXT[],
ads_txt_verified BOOLEAN NOT NULL DEFAULT false,
revenue_share_pct DECIMAL(5,2) NOT NULL DEFAULT 85.00,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE billing_settlements (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
settlement_date DATE NOT NULL,
dsp_id UUID NOT NULL REFERENCES dsp_configurations(id),
publisher_id UUID REFERENCES publishers(id),
impressions BIGINT NOT NULL,
clicks BIGINT NOT NULL,
gross_spend_cents BIGINT NOT NULL,
exchange_fee_cents BIGINT NOT NULL,
publisher_payout_cents BIGINT NOT NULL,
dsp_reported_impressions BIGINT,
discrepancy_pct DECIMAL(5,4),
status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_settlements_dsp ON billing_settlements(dsp_id, settlement_date);
CREATE INDEX idx_settlements_pub ON billing_settlements(publisher_id, settlement_date);ssp_configurations and creative_audit_log follow the same pattern.
8.3 Event schemas (Kafka → ClickHouse)
The impressions table is the billing source of truth.
CREATE TABLE impressions (
impression_id String,
auction_id String,
timestamp DateTime64(3),
dsp_id String,
publisher_id String,
publisher_domain String,
creative_id String,
advertiser_domain String,
price_cpm Float64,
user_id Nullable(String),
device_type Enum8('desktop'=1, 'mobile'=2, 'tablet'=3, 'ctv'=4),
geo_country LowCardinality(String),
geo_region LowCardinality(String),
viewable Nullable(UInt8),
viewability_pct Nullable(Float32),
time_in_view_ms Nullable(UInt32),
is_click UInt8 DEFAULT 0,
click_timestamp Nullable(DateTime64(3))
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (dsp_id, publisher_id, timestamp)
TTL timestamp + INTERVAL 90 DAY;The auction_results table follows the same shape with winning_price_cpm, num_bids_received, num_dsps_selected, and num_dsps_timeout. Writes are 100% for winners and 1% for losers.
8.4 Valkey keyspace
| Key | Structure | TTL | Purpose |
|---|---|---|---|
user:{uid}:consent | Hash | 30d | TCF string + consent status |
user:{uid}:cookiesync | Hash | 30d | Exchange UID ↔ DSP buyer UIDs |
user:{uid}:creative:{crid} | Counter | 1h | Exchange-level creative dedup |
dsp:{dsp_id}:spend_today | Hash | until midnight | Spend, credit limit, credit remaining |
dsp:{dsp_id}:circuit | Hash | 5m | Circuit breaker state |
dsp:{dsp_id}:winrate:{vertical} | Float | 1h | Rolling win rate feeding smart selection |
ivt:ip_blocklist | Set | 1h | Known bot IPs |
ivt:asn_reputation | Hash | 1h | ASN reputation scores (datacenter, residential) |
ivt:ua_patterns | Set | 1h | Suspicious UA regex hits |
pub:{pub_id}:config | Hash | 5m | Publisher config cache |
pub:{domain}:adstxt | Hash | 24h | Cached ads.txt entries |
Per-campaign frequency caps, campaign budgets, and advertiser targeting rules don't appear here. Those live in DSPs.
9. API Design
9.1 Bid request (exchange to DSP)
POST /openrtb/2.6/bid
Content-Type: application/json
X-OpenRTB-Version: 2.6
{
"id": "auc_01HXYZ123",
"imp": [{
"id": "imp_001",
"banner": {"w": 300, "h": 250, "pos": 1},
"bidfloor": 0.50,
"bidfloorcur": "USD"
}],
"site": {
"domain": "espn.com",
"page": "https://espn.com/nfl/story/cowboys-giants-recap",
"cat": ["IAB17"],
"publisher": {"id": "pub_espn", "domain": "espn.com"}
},
"user": {
"id": "uid_user_abc",
"buyeruid": "ttd_user_xyz",
"geo": {"country": "USA", "region": "TX", "city": "Austin"},
"consent": "CPXxRfAPXxRfAAfKAB..."
},
"device": {
"ua": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X)...",
"ip": "198.51.100.42",
"devicetype": 4,
"os": "iOS"
},
"regs": {"coppa": 0, "gdpr": 0},
"tmax": 60,
"at": 1,
"cur": ["USD"],
"source": {
"ext": {
"schain": {
"ver": "1.0",
"complete": 1,
"nodes": [{"asi": "adx.google.com", "sid": "pub_espn", "hp": 1}]
}
}
}
}DSP bid response (200 OK):
{
"id": "auc_01HXYZ123",
"seatbid": [{
"bid": [{
"id": "bid_ttd_001",
"impid": "imp_001",
"price": 8.50,
"adm": "<div id='ad-${AUCTION_ID}'><a href='${CLICK_URL}https://nike.com/pegasus-41'><img src='https://cdn.nike.com/cr/pegasus_300x250.jpg' width='300' height='250'/></a><img src='${IMPRESSION_URL}' style='display:none'/></div>",
"crid": "cr_nike_pegasus_01",
"w": 300,
"h": 250,
"adomain": ["nike.com"],
"cat": ["IAB18"]
}],
"seat": "seat_nike"
}],
"cur": "USD"
}DSP no-bid: 204 No Content.
9.2 Win notice (exchange to DSP)
POST /win
{
"auction_id": "auc_01HXYZ123",
"bid_id": "bid_ttd_001",
"imp_id": "imp_001",
"price": 8.50,
"currency": "USD",
"timestamp": "2026-05-29T14:32:00.045Z"
}
9.3 Impression tracking
GET /t/imp?auc=auc_01HXYZ123&imp=imp_001&price=enc_xyz&dsp=ttd&pub=pub_espn&crid=cr_nike_pegasus_01
→ 200 OK, Content-Type: image/gif, 43-byte transparent GIF
The endpoint parses query params, decrypts the price token, writes to Valkey for dedup, produces to Kafka asynchronously, and returns the pixel. Target p99 under 5ms.
9.4 Click redirect
GET /t/click?auc=auc_01HXYZ123&imp=imp_001&dest=https%3A%2F%2Fnike.com%2Fpegasus-41
→ 302 Found
Location: https://nike.com/pegasus-41
Destination URLs are validated against a whitelist pattern (allowed schemes, no open-redirect loops) before the 302 is emitted.
9.5 Publisher config API
PUT /v1/publishers/{publisher_id}/config
{
"floor_price_cents": 75,
"blocked_categories": ["IAB25", "IAB26"],
"blocked_advertiser_domains": ["competitor.com"],
"revenue_share_pct": 85.00
}
9.6 DSP onboarding
POST /v1/dsps
{
"dsp_name": "Example DSP",
"bid_endpoint": "https://dsp.example.com/bid",
"win_notice_endpoint": "https://dsp.example.com/win",
"max_qps": 100000,
"timeout_ms": 60,
"allowed_categories": ["IAB1", "IAB17"],
"allowed_geos": ["US", "CA"],
"seat_id": "seat_example_001"
}
→ 201 Created { "id": "...", "status": "SANDBOX", "mtls_cert_url": "..." }
9.7 Settlement report
GET /v1/settlements?dsp_id=ttd&start=2026-06-01&end=2026-06-07
{
"dsp_id": "ttd",
"period": {"start": "2026-06-01", "end": "2026-06-07"},
"totals": {
"impressions": 1750000000,
"clicks": 17500000,
"gross_spend_cents": 14000000000,
"exchange_fee_cents": 1400000000,
"publisher_payout_cents": 12600000000,
"dsp_reported_impressions": 1749640000,
"discrepancy_pct": 0.0002
}
}
10. Deep Dives
10.1 RTB auction flow, end to end
The example from §1 maps onto the timing breakdown below. The exchange's part is steps 4–8, which is the 43ms of actual work it does between receiving the request from Magnite and returning the winning markup.
| Step | Component | Time | Running |
|---|---|---|---|
| 1 | ESPN page HTML loads, Prebid.js executes | 20 ms | 20 ms |
| 2 | Prebid → Magnite SSP bid request | 3 ms | 23 ms |
| 3 | Magnite → AdX network hop | 3 ms | 26 ms |
| 4 | LRU cache hit: consent, cookie-sync, DSP flags | 0.1 ms | 26.1 ms |
| 5 | Pre-bid fraud filter + ads.txt validate | 1 ms | 27.1 ms |
| 6 | Smart DSP selection (top 5 of 50) | 1 ms | 28.1 ms |
| 7 | DSP fan-out parallel (60 ms timeout, arrives ~40 ms) | 40 ms | 68.1 ms |
| 8 | First-price auction + floor + macro sub | 1 ms | 69.1 ms |
| 9 | AdX → Magnite network hop | 3 ms | 72.1 ms |
| 10 | Magnite → Prebid (selected as winner across SSPs) | 3 ms | 75.1 ms |
| 11 | GAM direct-deal check + render decision | 5 ms | 80.1 ms |
| 12 | CloudFront creative fetch (Austin POP cache hit) | 12 ms | 92.1 ms |
| 13 | Browser renders 300×250 | 15 ms | 107.1 ms |
| 14 | Impression pixel fires (async) | 2 ms | 109.1 ms |
The entire page-load-to-ad-visible budget is around 110ms in this example. The exchange's portion is 43ms; the rest is network transit, SSP coordination, GAM decisioning, and browser rendering.
The Go fan-out code:
func (e *Exchange) runAuction(ctx context.Context, req *openrtb.BidRequest) (*AuctionResult, error) {
ctx, cancel := context.WithTimeout(ctx, 60*time.Millisecond)
defer cancel()
// Top-5 smart selection (see §10.2)
dsps := e.dspSelector.SelectTopN(req, 5)
bidChan := make(chan *DSPBid, len(dsps))
for _, dsp := range dsps {
go func(d *DSPConfig) {
bid, err := e.sendBidRequest(ctx, d, req)
if err != nil {
e.metrics.DSPError(d.ID, err)
bidChan <- nil
return
}
bidChan <- bid
}(dsp)
}
var bids []*DSPBid
received := 0
Loop:
for received < len(dsps) {
select {
case bid := <-bidChan:
received++
if bid != nil && bid.Price > 0 {
bids = append(bids, bid)
}
case <-ctx.Done():
e.metrics.AuctionTimeout(len(bids), len(dsps)-received)
break Loop
}
}
if len(bids) == 0 {
return &AuctionResult{Filled: false}, nil
}
return e.firstPriceAuction(bids, req.Imp[0].BidFloor), nil
}
func (e *Exchange) firstPriceAuction(bids []*DSPBid, floor float64) *AuctionResult {
var winner *DSPBid
for _, b := range bids {
if b.Price < floor {
continue
}
if winner == nil || b.Price > winner.Price {
winner = b
}
}
if winner == nil {
return &AuctionResult{Filled: false}
}
return &AuctionResult{Filled: true, Winner: winner, Price: winner.Price}
}10.2 Smart DSP selection
The default fan-out instinct is to ask every DSP every time. At 50 DSPs and a million requests per second that means 50 million outbound bid requests per second, and the auction is bottlenecked on the slowest of fifty different bidders. Picking the five most likely to bid competitively cuts the outbound traffic by 10x, drops the per-DSP costs by the same margin, and barely moves the fill rate.
The scoring function combines hard filters (geo, format, category, capacity, circuit-breaker state) with two continuous signals: historical win rate for this segment, and a pacing factor based on how aggressively the DSP is currently spending. The hard filters drop any DSP that obviously can't or shouldn't bid. The continuous score ranks the survivors.
type DSPScore struct {
DSPID string
Score float64
}
func (s *DSPSelector) SelectTopN(req *BidRequest, n int) []*DSPConfig {
geo := req.User.Geo.Country
format := req.Imp[0].Format()
vertical := req.Site.Cat[0]
var candidates []DSPScore
for _, dsp := range s.registry.All() {
// Hard filters
if !dsp.AcceptsGeo(geo) { continue }
if !dsp.AcceptsFormat(format) { continue }
if !dsp.AcceptsCategory(vertical) { continue }
if !dsp.CapacityAvailable() { continue }
if dsp.CircuitBreakerOpen() { continue }
// Continuous score
winRate := dsp.HistoricalWinRate(geo, vertical, format)
pacing := dsp.PacingFactor()
candidates = append(candidates, DSPScore{
DSPID: dsp.ID,
Score: winRate * pacing,
})
}
sort.Slice(candidates, func(i, j int) bool {
return candidates[i].Score > candidates[j].Score
})
if len(candidates) > n {
candidates = candidates[:n]
}
// 10% exploration: occasionally include a non-top DSP to discover new demand
if rand.Float64() < 0.10 && len(s.registry.All()) > n {
explorer := s.registry.RandomExploration(candidates)
if explorer != nil {
candidates[n-1] = DSPScore{DSPID: explorer.ID, Score: 0}
}
}
return s.registry.Resolve(candidates)
}A small exploration bonus matters more than it looks. Without it, any DSP that starts with a zero win rate stays at zero forever. The fix is to replace the lowest-ranked top-5 slot with a randomly chosen non-top DSP about 10% of the time. New DSPs get a fair shot, win rates update, and the feedback loop promotes them into the regular top-N when they earn it.
The win-rate input itself comes from a Flink job (or a simpler Kafka consumer if Flink feels heavy) that aggregates the last 60 minutes of auction outcomes keyed by (dsp_id, geo, vertical, format) and writes the result to Valkey. The auction servers cache it in-process with a one-minute TTL.
The bandwidth saving from this single change:
| Strategy | DSP requests/sec | Outbound bandwidth | DSPs touched |
|---|---|---|---|
| All 50 (naive) | 50M | 50 GB/sec | 50 per auction |
| Top 5 (smart) | 5M | 5 GB/sec | 5 per auction |
| Fill-rate delta | — | — | < 2% drop |
A 10× cut in DSP-side cost and bandwidth, against a fill-rate hit small enough to be in the noise of normal day-to-day variation.
10.3 Auction types
First-price auctions are what the industry runs now. The winner pays their bid, the math is trivial, and there's nothing for the exchange to manipulate.
The earlier model (second-price, where the winner pays one cent more than the second-highest bid) encouraged truthful bidding in theory but invited "last look" abuse in practice, where exchanges with knowledge of all bids could give favored buyers a chance to bid one cent above the clearing price. First-price killed that ambiguity by making the clearing price equal to the bid.
Header bidding versus waterfall is a separate question. The waterfall model called exchanges sequentially (try exchange A first, then B, then C) which was slow and left money on the table because a higher bid in a later exchange would never see daylight. Header bidding (Prebid.js in the browser, or its server-side equivalent) calls all exchanges in parallel and picks the highest bid across all of them. Server-side header bidding is what mature publishers run today; it removes the browser-side latency cost while keeping the parallel competition.
10.4 DSP bid optimization
This is opaque to the exchange but worth understanding because DSP behavior is what determines fill rate and per-auction latency. A typical DSP bidder runs through something like:
class BidOptimizer:
def compute_bid(self, req: BidRequest, campaign: Campaign) -> Optional[float]:
features = self.extract_features(req, campaign)
# ML: LightGBM or deep learning trained on historical data
pctr = self.ctr_model.predict(features)
pcvr = self.cvr_model.predict(features)
if campaign.bid_strategy == "CPA":
expected_value = pctr * pcvr * campaign.target_cpa
elif campaign.bid_strategy == "CPC":
expected_value = pctr * campaign.max_cpc
else: # CPM
expected_value = campaign.max_cpm / 1000
# Bid shading for first-price auctions
shading = self.shading_model.predict(features) # 0.5-0.85
bid = expected_value * shading
# Internal checks invisible to the exchange
if not self.budget_allows(campaign, bid): return None
if not self.frequency_allows(req.user_id, campaign.id): return None
if bid < req.imp[0].bidfloor: return None
return bidFor the example from §1, TTD's pCTR was about 8% (very high; the user was a fresh cart abandoner looking at the same product they'd left in their cart the night before), pCVR was around 12%, and the expected value worked out to $1.25 per impression, which is a maximum CPM well over $1,000. The shading model knew that ESPN sports impressions usually clear around $8 and shaded down to $8.50, comfortably above the predicted clearing price and far below the maximum.
10.5 Ad server and creative delivery
Creatives are uploaded by advertisers into their DSP, not into the exchange. The DSP pushes them to a creative CDN of its own choosing. The exchange runs an asynchronous malware and policy scan on first-seen creative_id values from DSP responses and caches the verdict; first-time creatives are allowed through optimistically and flagged for backfill scanning, with a fast block path for anything that fails.
At runtime the DSP returns either HTML (for banner) or VAST XML (for video) in the adm field of the bid response. The exchange substitutes a small set of macros before forwarding the markup to the SSP:
| Macro | Replaced With |
|---|---|
${AUCTION_ID} | Auction identifier |
${AUCTION_PRICE} | Encrypted price token (AES-256-GCM) |
${CLICK_URL} | Exchange click-tracking URL |
${IMPRESSION_URL} | Exchange impression-pixel URL |
${CACHE_BUSTER} | Random number to defeat caching on tracking pixels |
Substitution is a single pass over the markup string, so the cost is negligible compared to the auction itself. The encrypted price token is the part that matters for billing. It's an AES-256-GCM ciphertext containing the clearing price and a timestamp. Encrypting it stops the publisher, the SSP, or any browser extension on the user's machine from reading or forging the price, which is what would otherwise let them reverse-engineer bid patterns or tamper with billing.
Viewability uses the IAB-defined IntersectionObserver beacon: at least 50% of pixels in the viewport for at least 1 second for display, 2 seconds for video. The beacon writes to Kafka via the tracking endpoint. IAS and DoubleVerify publish their own independent beacons too; the exchange's is a backup, not the primary measurement that bills go against.
For video the exchange returns VAST 4.2 XML instead of HTML. The DSP supplies the media-file URLs, the exchange wraps them with impression, click, and quartile tracking events pointing at its own tracking endpoint, and the video player on the publisher's page fires those events as the video plays.
10.6 Per-DSP spend tracking and settlement
The exchange is the financial middleman: money flows advertiser → DSP → exchange → publisher, with the exchange's take rate deducted at settlement time rather than per auction. What the exchange tracks in real time is per-DSP spend for the day, for two reasons. The first is credit limits: many DSPs run on prepaid balances, and a DSP exceeding its balance has to be cut off from auctions within seconds, not hours. The second is anomaly detection: a sudden 10× spike in a DSP's spend velocity is usually a compromised account or a runaway campaign.
class DSPSpendAggregator:
def process_impression(self, dsp_id: str, imp: ImpressionEvent):
state = self.get_state(dsp_id)
state.spend_today_cents += int(imp.price_cpm * 100 / 1000)
state.impressions_today += 1
self.valkey.hset(
f"dsp:{dsp_id}:spend_today",
mapping={
"spend_cents": state.spend_today_cents,
"impressions": state.impressions_today,
"credit_limit": state.credit_limit_cents,
"credit_remaining": state.credit_limit_cents - state.spend_today_cents,
"last_update": datetime.utcnow().isoformat(),
}
)
if state.spend_today_cents >= state.credit_limit_cents:
self.valkey.set(f"dsp:{dsp_id}:credit_blocked", "1", ex=3600)
self.alert(f"DSP {dsp_id} exceeded credit limit")
if state.spend_today_cents > state.expected_daily * 1.5:
self.alert(f"DSP {dsp_id} spend anomaly")The auction server's check is a single lookup against the in-process cache for the dsp:{dsp_id}:credit_blocked key. The lag from impression to enforcement is a Flink tumbling window (1 second), a Valkey write (1 millisecond), and an in-process cache TTL (5 seconds), so the worst case is roughly 6 seconds of unchecked spend after a DSP crosses its limit. At a $100K daily limit that's about $7 of overshoot per second of lag, which is acceptable in exchange for not blocking the auction path on a synchronous credit check.
Daily settlement runs at 02:00 UTC. It queries ClickHouse for spend, impressions, and clicks grouped by (dsp_id, publisher_id) for the prior day, applies the take rate, and writes a row per pair to billing_settlements. Any row whose discrepancy with the DSP-reported number is over 0.01% goes into a manual review queue. Payment batches go to publishers and invoices to DSPs once the rows clear review.
10.7 Impression and click tracking
The tracking pipeline is the source of truth for billing. Lose events and the exchange under-bills (revenue gone) or over-bills (trust gone). The whole pipeline is built with that in mind: dedupe in Valkey, produce to Kafka asynchronously, return the pixel quickly, fail open rather than fail closed.
Deduplication exists because pixels fire more than once in the wild. Browsers retry on flaky connections. Ad slots refresh and re-fire the pixel. Buggy ad tags double-fire. Without dedupe, an impression gets billed twice and the advertiser quite reasonably gets upset. The dedupe key is impression:{auction_id}:{imp_id} with a one-hour TTL, written via SETNX. On Valkey errors the code intentionally fails open and treats the impression as new, because over-counting by a fraction of a percent is much less harmful than under-counting and losing real revenue.
func (t *TrackingService) handleImpression(w http.ResponseWriter, r *http.Request) {
auctionID := r.URL.Query().Get("auc")
impID := r.URL.Query().Get("imp")
encPrice := r.URL.Query().Get("price")
price, err := t.decryptPrice(encPrice)
if err != nil {
t.metrics.InvalidPrice.Inc()
t.servePixel(w); return
}
dedupKey := fmt.Sprintf("impression:%s:%s", auctionID, impID)
isNew, err := t.valkey.SetNX(r.Context(), dedupKey, "1", time.Hour).Result()
if err != nil {
// Fail open: better to slightly over-count than lose revenue
isNew = true
}
if isNew {
t.kafka.ProduceAsync("impressions", auctionID, &ImpressionEvent{
AuctionID: auctionID,
ImpID: impID,
Price: price,
DSPID: r.URL.Query().Get("dsp"),
PublisherID: r.URL.Query().Get("pub"),
CreativeID: r.URL.Query().Get("crid"),
Timestamp: time.Now(),
UserAgent: r.UserAgent(),
IP: extractIP(r),
})
}
t.servePixel(w)
}Click handling is structurally identical, with a 302 Location header instead of a transparent GIF and a destination-URL whitelist check to stop open-redirect abuse.
10.8 Fraud detection beyond IP blocklists
Basic fraud detection (blocking known bot IPs and obvious user-agent patterns) catches maybe 30% of invalid traffic on a good day. The rest needs a deeper stack. Some signals run pre-bid in the auction path because they have to be sub-millisecond; others run post-bid in a Flink job that updates reputation scores, which then feed back into the next round of pre-bid filters.
IP reputation is the cheapest signal: a Valkey set refreshed hourly from threat-intel feeds. ASN and datacenter detection use a MaxMind lookup; AWS, GCP, Azure, and Digital Ocean ASNs get flagged as datacenter traffic, which catches server-rented bot pools. User-agent entropy checks for signatures that are statistically too common for real browsers, which catches botnets running the same UA across thousands of requests. Headless browser fingerprints look for missing WebGL, missing canvas, and the canary flags Chrome sets in headless mode (when those signals make it through schain.ext).
After the impression renders, the JavaScript beacon adds another layer: cursor entropy, scroll velocity, time-in-view duration, and the time between impression and click. A click that fires 200ms after the impression is almost certainly automated. Time-in-view measurements catch ad stacking, where multiple ads are layered on top of each other so only the top one is actually visible. A per-publisher rolling viewability rate catches inventory that's quietly degrading.
Supply-chain validation (ads.txt, sellers.json, the schain object) closes the domain-spoofing loop and is covered in §15.4. Daily DSP reconciliation catches anything that slipped through both pre-bid and post-bid by comparing exchange-tracked impressions with DSP-reported impressions; any large discrepancy goes into the same manual review queue as billing disputes.
Pre-bid catches roughly 70% of known fraud in a typical month, post-bid another 20%. The remaining 10% is what drives the daily reconciliation work and the per-publisher viewability monitoring.
One more category that's easy to forget: malicious creative markup. A DSP can embed JavaScript in its adm field that does things the exchange didn't sign up for. The fix is the asynchronous creative scan from §10.5: first-seen creatives are allowed through optimistically but scanned in the background, anything that fails is blocked, and the DSP's circuit breaker counter increments.
11. Bottlenecks
There are eight things in this design that could become bottlenecks under load. A few of them only matter in theory; a couple actually bite in practice.
The one that gets the most theoretical attention is DSP fan-out. At a million QPS with 50 registered DSPs, naive fan-out would be 50 million outbound HTTP/2 requests per second, which would saturate both bandwidth and the CPU spent on serialization. This is the bottleneck that smart DSP selection (§10.2) exists to remove: top-5 cuts it to 5 million per second, which fits comfortably in the budget.
Valkey hot keys are the next obvious worry, with a viral page or a celebrity-page event causing millions of lookups for the same user ID. In practice the in-process LRU absorbs this almost completely. The 5-second TTL is short enough that staleness isn't an issue, and the same user appears in many simultaneous auctions during traffic spikes, which is exactly when LRU hit rate goes up. Cache hit rate stays above 95% even during the worst spikes seen in production.
Kafka ingestion is the bottleneck that does require care. At a million QPS, naively logging every bid (winning and losing) would be roughly 5 GB/sec, tripled by replication to 15 GB/sec written. The fix is sampling: 100% of impressions and winning bids, 1% of losing bids. The full bid stream gets teed directly to S3/Iceberg through Kafka Connect, which is much cheaper durable storage than hot Kafka and handles billing-dispute lookups without burning hot capacity.
ClickHouse query and ingestion compete for resources during peak dashboard usage. The fix is to run two clusters: one ingestion-only (Kafka consumer) and one query-only (dashboards). Materialized views handle the most common queries so analyst sessions don't reach into the raw tables more than they need to.
The tracking endpoint sees spikes of around 300K impression pixels per second and has to return inside 5ms or it starts blocking page rendering. It runs as a separate pod group with a deliberately tiny code path: parse query string, decrypt price, dedupe in Valkey, produce to Kafka asynchronously, return the GIF. No database writes on the hot path, no synchronous downstream calls.
CDN origin pulls hurt during creative rotation. When a new campaign launches with brand-new creatives, the CDN cache is cold and origin gets hit hard. The fix is a combination of DSPs pre-warming creatives to all POPs before campaign start, and the auction server quietly deprioritizing bids that point at uncached creative URLs for the first 60 seconds after a creative first appears.
DSP spend tracking lag (the 6-second worst case from Flink window plus Valkey write plus LRU TTL) is a minor source of credit-limit overshoot. Worth monitoring (the alert fires if the window grows past 10 seconds), but the dollar exposure is tiny relative to daily spend.
The slowest DSP in the top-5 is the bottleneck that bounds per-auction latency once everything else is healthy. Adaptive timeouts help: DSPs that consistently respond fast get a generous 55ms timeout, ones that consistently respond slow get a strict 35ms timeout, which has the side effect of pushing them out of the top-5 entirely once their win rate decays. The circuit breaker (§12.2) handles the harder failure modes.
12. Failure Scenarios
12.1 Valkey cluster failure
Valkey going down is unpleasant but not fatal. The in-process LRU keeps serving for its 5-second TTL, which buys a bit of breathing room. Cache misses fall through to nothing: the auction server runs in degraded mode where it strips PII (since consent state is unknown), skips fraud lookups (since the blocklists are unreachable), and skips DSP credit checks. Fill rate drops because DSPs see less data and tend to bid lower, but the exchange keeps earning revenue and the failure isn't a customer-visible outage.
The detection path uses health checks with a 3-second window. When the circuit breaker opens, every auction server flips into degraded mode within seconds. The on-call gets paged and the strict-PII flag is enabled to make sure no consent-needed data accidentally goes out.
12.2 DSP unresponsive
A DSP going slow or returning errors is much more common than total Valkey failure. The fix is a per-DSP circuit breaker with three states: closed (normal), open (all requests rejected immediately, DSP excluded from selection), and half-open (1% probe requests to detect recovery). Transition rules: closed flips to open when the error rate exceeds 50% or the timeout rate exceeds 30% in a 60-second sliding window. Open flips to half-open after 30 seconds of cooldown. Half-open flips back to closed if 10 consecutive probes succeed, or back to open if any probe fails.
Circuit breaker state lives in Valkey under dsp:{dsp_id}:circuit and gets refreshed into the in-process cache once a second. When a DSP is in the open state, smart selection skips it and the next-best DSP slides in to take its place in the top-5, so the auction barely notices. The one alert that matters here is the revenue impact alert: if excluding a top-5-by-revenue DSP drops total revenue by more than 5%, the on-call gets paged for a manual look.
12.3 Kafka degradation
Slow or partially unavailable Kafka means the auction server can't produce events at the normal rate. Each pod buffers up to 100K events in an in-memory ring (about 150 MB) and drains it once Kafka recovers. If the ring fills, events spill to local disk as WAL files, replayed on recovery. DSP spend tracking falls back to the last-known Valkey values during the outage, which means some DSPs may slightly exceed their credit limits; it's all caught and corrected on the daily reconciliation.
12.4 CDN origin failure
S3 origin going down means the CDN edges can't pull cache misses. stale-while-revalidate headers let already-cached creatives keep serving past their TTL, which covers most of the existing demand. The auction server checks a per-creative health flag before returning a bid that points at an uncached creative URL. If origin has been down more than 5 minutes, that creative is excluded and the auction picks the next-best bid. New campaigns launching during an outage are delayed.
12.5 Flink spend aggregator crash
Flink restarts lose in-memory per-DSP spend state. Checkpoints to S3 every 30 seconds keep this from being a real problem on most restarts: the latest checkpoint comes back almost immediately. If the checkpoint is more than 5 minutes stale, the bootstrap path queries ClickHouse to rebuild the day's spend totals:
SELECT dsp_id, sum(price_cpm)/1000 AS spend_usd
FROM impressions
WHERE timestamp >= today()
GROUP BY dsp_idDuring the ~30-second bootstrap window, auction servers rely on the last DSP credit flags Valkey had. Some DSPs may overspend by a few dollars; the daily reconciliation catches and bills it.
12.6 Auction server OOM
A traffic spike (breaking news, a major sports event going viral) can drive QPS past pod capacity. Each pod enforces a hard max_concurrent_requests = 5,000 limit and returns 503 with Retry-After: 1 past that. HPA scales on CPU (target 60%) and on a custom concurrent_auctions_per_pod metric. SSPs retry with exponential backoff and route to other exchanges if the 503s persist. The pre-provisioned headroom (100 pods at 60% utilization) absorbs about a 67% spike without scaling at all.
12.7 Shedding load when things back up
Traffic spikes, slow DSPs, and slow backends are all easier to handle by shedding low-value work early than by trying to serve everything and failing later. The shedding order is set up to drop the work that matters least first.
Step one is to drop auctions with floor prices under $0.50 CPM. They can't produce meaningful revenue and the latency budget is better spent elsewhere. Step two is to drop tier-3 publishers (the lowest-revenue-share contracts) before touching premium publishers. Step three is to turn off the 10% exploration bonus in DSP selection and only fan out to the top-N in pure rank order. Step four, only under sustained overload, is to reduce the fan-out from top-5 to top-3.
The DSP timeouts adapt the same way. A worker checks p95 response time per DSP every 60 seconds and rebalances: DSPs averaging under 30ms get a 55ms timeout, ones over 50ms get 35ms. Slow DSPs that can't keep up get pushed out of the top-5 naturally as their win rate decays.
The shedding response is always 203 No Content rather than 5xx. SSPs interpret 5xx as "the exchange is broken" and start routing away from it; 203 is "no fill, normal outcome" and the SSP just moves on.
13. Deployment
13.1 Multi-region layout
| Region | Auction pods | Valkey | Kafka | ClickHouse | Purpose |
|---|---|---|---|---|---|
| us-east-1 | 40 | 6 | 10 | 12 | Primary, North America |
| eu-west-1 | 30 | 6 | 10 | 12 | Europe (GDPR strict mode) |
| ap-south-1 | 30 | 6 | 10 | 12 | Asia-Pacific |
| Global (S3, CDN, PG) | — | — | — | — | Shared storage, config, creative CDN |
Geo-DNS does latency-based routing per SSP request to the nearest region. Cross-region failover happens via DNS TTL of 30 seconds when the regional health check goes red. Postgres is one global primary in us-east-1 with read replicas in each region; config changes propagate via logical replication with about 200ms of lag, and auction servers always read from their local replica. Settlement and billing run only out of us-east-1, consuming all three regions' Kafka topics through MirrorMaker, so financial reports have a single source of truth.
13.2 Pipeline
Canary fail thresholds: auction p99 over 110ms, fill-rate drop over 2%, DSP timeout rate up by more than 5 percentage points, 5xx rate over 0.1%, RPM down more than 3%.
13.3 Rollback
| Component | Method | Time |
|---|---|---|
| Auction server code | k8s rolling update to previous image | < 5 min |
| DSP config | Revert in PG, publish Kafka config event | < 30 sec |
| Flink spend job | Redeploy previous JAR from S3 checkpoint | < 2 min |
| Tracking endpoint | k8s rolling update | < 3 min |
| ClickHouse schema | Forward-only, columns added backward-compatible | N/A |
| Publisher config | Revert via API | Immediate |
14. Observability
14.1 Key metrics
| Metric | Type | Alert threshold |
|---|---|---|
auction.qps | Counter | < 700K (30% below baseline) or > 1.8M (spike) |
auction.latency.p50 | Histogram | > 50 ms |
auction.latency.p99 | Histogram | > 100 ms |
auction.fill_rate | Gauge | < 25% |
auction.revenue_per_1k | Gauge | > 10% drop from 1h MA |
dsp_selection.top_n_time | Histogram | > 2 ms |
dsp.{id}.response_time.p99 | Histogram | > 55 ms |
dsp.{id}.nobid_rate | Gauge | > 95% |
dsp.{id}.circuit_breaker | Gauge | state = OPEN |
dsp.{id}.spend_today | Gauge | > 90% of credit limit |
tracking.impression.qps | Counter | < 250K |
tracking.dedup.rate | Gauge | > 5% |
lru.hit_rate | Gauge | < 90% |
valkey.ops_per_sec | Counter | > 1M (capacity alarm) |
valkey.latency.p99 | Histogram | > 2 ms |
kafka.consumer_lag.impressions | Gauge | > 100K events |
clickhouse.query.p99 | Histogram | > 10 s |
cdn.cache_hit_rate | Gauge | < 90% |
settlement.discrepancy | Gauge | > 0.01% |
ivt.blocked_rate | Gauge | > 15% |
load_shed.rate | Gauge | > 1% (load-shedding kicking in) |
14.2 Dashboard
┌────────────────────────────────────────────────────────┐
│ Auction QPS │ Latency (p50/p99) │
│ 1.05M [live] │ 42ms / 78ms │
├────────────────────────────────────────────────────────┤
│ Fill Rate │ Revenue $/hour │
│ 31.8% [24h] │ $4.2M [24h] │
├────────────────────────────────────────────────────────┤
│ DSP Response Matrix (top 10) │
│ TTD: 32ms ok │ DV360: 38ms ok │ Amazon: 44ms ok │
│ Criteo: 41ms ok │ Xandr: OPEN │ Magnite: 28ms ok │
├────────────────────────────────────────────────────────┤
│ DSP Spend & Credit │
│ TTD: $1.2M / $5M (24%) [healthy] │
│ DV360: $4.8M / $5M (96%) [approaching limit] │
├────────────────────────────────────────────────────────┤
│ Tracking: imp 310K/s, click 3.1K/s, dedup 1.1% │
│ LRU hit rate: 96.3% │ Load shed rate: 0.0% │
└────────────────────────────────────────────────────────┘
14.3 Distributed tracing
Every auction carries a trace ID through the whole lifecycle. OTel spans:
Trace: auc_01HXYZ123 (42ms total)
├── LRU enrichment (0.1ms) [hit]
├── Pre-bid filter (1ms)
├── DSP selection (0.8ms) → [ttd, dv360, amazon, criteo, xandr]
├── DSP fan-out (38ms)
│ ├── ttd req (32ms) ok bid $8.50
│ ├── dv360 req (38ms) ok bid $5.00
│ ├── amazon req (44ms) ok bid $4.00
│ ├── criteo req (41ms) ok bid $6.50
│ └── xandr req (circuit_open)
├── Auction logic (0.5ms)
├── Macro sub + response (1ms)
├── Kafka publish (async, 2ms after response)
└── [later] Impression pixel received (t+112ms)
14.4 Alerting tiers
| Tier | Trigger | Action |
|---|---|---|
| P0 (page now) | QPS drop > 50%, fill rate drop > 50%, all DSP circuits open | Page on-call + eng lead |
| P1 (page 15m) | p99 > 150 ms for 5 m, top-5 DSP circuit open, Kafka lag > 1M | Page on-call |
| P2 (Slack) | DSP credit > 90%, IVT rate > 15%, CDN hit < 85%, discrepancy > 0.005% | #exchange-ops |
| P3 (daily) | DSP no-bid rate shift > 10%, fill drop > 5%, creative rejection > 3% | Daily ops review |
15. Security
15.1 Data classification
| Data | Class | At Rest | In Transit |
|---|---|---|---|
| User IDs (exchange) | Pseudonymous PII | AES-256 | TLS 1.3 |
| IP addresses | PII | AES-256, hashed after 30d | TLS 1.3 |
| Consent strings (TCF) | Regulated PII | AES-256 | TLS 1.3 |
| Auction bid data | Confidential | AES-256 | TLS 1.3 |
| Clearing prices (in markup) | Confidential | AES-256-GCM | TLS 1.3 |
| Creative assets | Public | S3 SSE | TLS 1.3 |
| Configurations | Internal | PG TDE | TLS 1.3 |
| Billing records | Restricted | AES-256 | TLS 1.3 + mTLS |
15.2 Authentication and authorization
| Actor | Auth | Scope |
|---|---|---|
| SSPs | mTLS + API key | Bid requests |
| DSPs | mTLS certificates | Receive bid requests, submit bids, receive win notices |
| Publishers | OAuth 2.0 + MFA | Config API, revenue dashboards |
| Internal services | mTLS | Service-to-service |
| Ops | SSO + MFA (Okta) | Dashboards, DSP config, incident response |
| Billing / Finance | SSO + MFA + role restriction | Settlement reports, payments |
15.3 Price encryption
The clearing price embedded in the impression pixel is encrypted with AES-256-GCM. Without that, intermediaries (the publisher, the SSP, a browser extension) could read or forge the clearing price, which would let them reverse-engineer bid patterns or tamper with billing reports. The plaintext is the price plus a Unix timestamp; the timestamp lets the tracking endpoint reject any token older than 24 hours as a replay. Each token gets a fresh nonce, base64-encodes to about 40 characters, and decodes inside the impression-tracking endpoint before the impression event is written to Kafka.
15.4 Supply chain: ads.txt, sellers.json, schain
Domain spoofing is one of the older ad-tech frauds: a fraudster claims to sell nytimes.com inventory while actually owning a parked low-quality domain. Three standards plug the loop. ads.txt is published at the root of every legitimate publisher domain and lists which sellers (SSPs and exchanges) are authorized to sell that publisher's inventory; the exchange crawls these daily and refuses any request from a seller that isn't in the matching ads.txt. sellers.json is the exchange's own published list of every seller it accepts, which DSPs use to verify the exchange's claims. The schain object is attached to every bid request and lists every hop in the supply chain from publisher to exchange; any unauthorized node in the chain causes a reject.
The validation path is straightforward: cached ads.txt entries (refreshed daily) get keyed by publisher domain, and each incoming request is checked against the entries for the seller ID and a DIRECT or RESELLER relationship. A miss in the cache falls back to the configured policy, which is either strict (reject) or permissive (allow with a flag) depending on the publisher tier.
15.5 Network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: auction-server-policy
spec:
podSelector:
matchLabels: { app: auction-server }
policyTypes: [Ingress, Egress]
ingress:
- from: [{ namespaceSelector: { matchLabels: { name: load-balancer } } }]
ports: [{ port: 8080 }]
egress:
- to: [{ namespaceSelector: { matchLabels: { name: data-plane } } }]
ports: [{ port: 6379 }, { port: 9092 }] # Valkey + Kafka
- to: [{ ipBlock: { cidr: 0.0.0.0/0 } }] # DSPs (external)
ports: [{ port: 443 }]15.6 Audit logging
Every auction result, every DSP bid, every impression, and every billing event writes to Kafka with append-only semantics. Audit topics use min.insync.replicas=3 and acks=all to make sure writes survive broker failures. The audit log archives to a separate AWS account with S3 Object Lock enabled, which means the data is write-once and can't be altered or deleted by anyone in the main account, including admins. Retention is 7 years for billing records and 2 years for bid-level logs.
Explore the Technologies
Dive deeper into the technologies and infrastructure patterns used in this design:
Core Technologies
| Technology | Role in This Design | Learn More |
|---|---|---|
| Valkey | Cold-path enrichment, DSP credit state, fraud lists, creative dedup, circuit breakers | Redis/Valkey |
| Kafka | Impressions, clicks, winning bids, sampled losing bids, DSP config distribution | Kafka |
| ClickHouse | Real-time spend dashboards, billing aggregation, analytics | ClickHouse |
| PostgreSQL | DSP config, publisher settings, SSP registrations, settlement records | PostgreSQL |
| Flink | Per-DSP spend aggregation, rolling win-rate computation for smart selection | Flink |
Infrastructure Patterns
| Pattern | Relevance | Learn More |
|---|---|---|
| CDN and edge caching | Creative delivery via CloudFront with > 95% hit rate | CDN |
| Circuit breaker | Per-DSP isolation of slow or failing demand partners | Circuit Breakers |
| Message queues | Kafka as universal event bus, sampled writes | Message Queues |
| Load balancing | L4 LB across auction pods at 1M QPS | Load Balancing |
| Load shedding | Drop low-value auctions to preserve core path | Load Shedding |
Further Reading
- OpenRTB 2.6 Specification (IAB Tech Lab) — Industry-standard protocol for programmatic ad bidding
- VAST 4.2 Specification — Video Ad Serving Template
- MRC Viewability Standards — Viewability measurement
- ads.txt and sellers.json (IAB) — Supply chain transparency
- Google Ad Manager Architecture — Reference architecture for large-scale ad serving