Data & StorageProblem 1 of 8
Data ProcessingMediumDeep Dive available
Design an Ad Click Aggregator
Design a click aggregation pipeline that ingests 10 billion clicks/day, deduplicates across multiple layers, detects fraud with ML running in parallel, produces sub-minute dashboards for advertisers, and reconciles to penny-accurate billing. The hard parts: preventing a 0.1% counting error from becoming $2.5M/day, running stream and batch paths in parallel without divergence, and making fraud detection fast enough that it doesn't gate ingestion.
Key Topics
Lambda Architecture (Batch Wins)Three Dedup LayersFlink Exactly-Once + WatermarksXGBoost Fraud DetectionInteger Micros for Money