Design a Price Tracking Service
Design a price tracking service like CamelCamelCamel or Honey. The system monitors product prices across multiple retailers, stores price history, and alerts users when prices drop below their target.
Key Topics
Interview Cheat Sheet
60s skim · 3min careful readScraping platform married to a streaming alert engine. Adaptive Scheduler spends a 50M/day scrape budget by priority, three-tier router (API, HTTP, Playwright) with residential proxy rotation, TimescaleDB stores 18B rows/year with continuous aggregates, Flink CEP fires alerts in sub-5s. The load-bearing trade-off is false alerts versus missed alerts, and we prefer missing real drops over firing iPhone-for-$20 nonsense.
- Scrape · 580/sec sustained · adaptive priority
Not every product gets scraped at the same rate. An Adaptive Scheduler scores each product by alert count, volatility, and how many people watch it, and maps that to a poll interval from 30 minutes to 7 days. Workers pull the next due product and pick the right scraping strategy (API, HTTP, or browser).
Adaptive Scheduler scores products (alert_count + volatility + watcher_count + recency, 0-100)maps to interval (30min/2hr/6hr/24hr/7d)Valkey ZADD scrape_queue {next_poll} {product_id}workers ZPOPMINrouter picks tier (~20% API, ~30% HTTP, ~50% Playwright)residential proxy pool (10K IPs, per-retailer session affinity 5-15 min)fetch + parseKafka price.events - Validate + store · anomaly check before TimescaleDB
Before a new price is committed, it gets compared to the last known price. Drops greater than 50% go into a human review queue instead of straight into the database. False alerts (iPhone for $20) erode trust faster than missed alerts.
Kafka price.eventsanomaly validatorcompare to last known price for (product_id, retailer)drops >50% routed to manual review queue (currency parse errors, decimal glitches usually)validated eventsTimescaleDB insert (partitioned by time, 15x compression on old chunks)continuous aggregates pre-compute daily/weekly/monthly buckets for chart rendering - Alert · Flink CEP · sub-5s end-to-end
Flink reads the validated price stream, keeps each product's recent price and all-time low in memory, and checks every alert rule per event. When a target is hit, it fires within a few milliseconds. No DB polling, no per-event scan over 100M alerts.
Kafka price.events (validated)Flink CEP keyed by product_idRocksDB state holds last_price, all_time_low, percentage_drop_window per productmatch against active alert rules (Postgres-loaded into Flink state)fire alert eventNotification Serviceper-user rate limit (10/hr, aggregate into '5 deals from your watchlist' push)APNs/FCM
- •10M products, 50M scrapes/day budget, ~580/sec sustained, ~1,500/sec with headroom
- •Storage: ~50 GB/day raw, ~1.8 TB/year, 18B rows after a year, ~200 GB B-tree index
- •Tier mix: ~20% API, ~30% HTTP, ~50% Playwright; 10K residential IP pool
- •Alert SLO: sub-5s end-to-end from scrape commit to push notification
- •Flink CEP: <10ms per event vs 50-200ms DB poll, RocksDB state, exactly-once on failover
- •Priority score 0-100 maps to intervals 30min/2hr/6hr/24hr/7d
- •Per-retailer session affinity: 5-15 min per IP, ~5-15 requests per session
- •TimescaleDB with continuous aggregates, not plain PostgreSQL, not InfluxDB
- •Flink CEP for alert evaluation, not DB polling, not naive per-event scan
- •Three-tier scrape router (API, HTTP, Playwright), not one-size-fits-all HTTP
- •Residential IPs with per-retailer session affinity, not datacenter IPs
- •Per-retailer parser modules with multiple extraction strategies, not one global parser
- •Anomaly validator with review queue for >50% drops, not raw insert to TimescaleDB
- •False alerts erode trust faster than missed alerts, which sets validator strictness
- •Parser-health dashboard is the on-call's first stop, not an afterthought metric
- •Marginal cost of freshness is Tier 3 Playwright plus proxies, dominant infra line item
- •Major retailer goes dark is a 6-hour pause with stale badge, not a minute-level recovery
- Flink CEP over polling for alerts
At 580 events/sec across 100M active alerts, polling the DB on each event would be 580 queries/sec at 50-200ms each, blowing the sub-5s SLO. Flink keeps last_price, all_time_low, and percentage-drop-window state per product in RocksDB locally, hits <10ms per event, supports complex window patterns natively, and gives exactly-once on failover. Alert tier-1, chart freshness tier-2 means alerts keep firing off Kafka even if TimescaleDB has a write outage; the consumer catches up on recovery.
- Three-tier scrape router (API → HTTP → Playwright)
One-size-fits-all HTTP scraping breaks on JS-rendered pages and burns proxy budget on retailers that have affiliate APIs. Three-tier router: ~20% via partner APIs (cheapest, fastest, allowed), ~30% via raw HTTP (static HTML pages), ~50% via Playwright (JS-rendered, slowest, most proxy-expensive). Cost optimization: every product moved from Playwright to HTTP saves an order of magnitude on infra. Residential proxies with per-retailer session affinity (5-15 min/IP) keep datacenter-detection at bay; 24-hour quarantine on any IP that takes a 403.
- Anomaly validator (false alerts > missed alerts)
iPhone for $20 false alert erodes user trust faster than missing a real flash sale. The validator compares every new price to the last known price for (product_id, retailer); drops >50% are routed to a human review queue, not into TimescaleDB. Currency parse errors and decimal-place glitches are the usual culprits. We prefer missing a real flash sale (the alert fires next scrape) over firing a false alert (user installs uninstalls). Same product principle as ranking-latency fallback in news feed: the failure mode users hate is louder than the failure mode they don't notice.
- Parser modules per retailer + parser-health dashboard
When Amazon redesigns their product page, every Amazon scraper breaks at once. Per-retailer parser modules contain blast radius to one retailer. Each parser has multiple extraction strategies (CSS selector, JSON-LD structured data, regex fallback) so a single selector change doesn't take the parser down. Parser-health dashboard pages on-call when success rate drops below 90% for any retailer; the response is to pause scraping (preserve proxy quota), ship the parser fix, then queue affected products for re-scrape. Parser-health is the on-call's first stop, not an afterthought metric.
40-Minute Interview Playbook
Each phase is what the interviewer expects you to do and say. Concrete steps, not topic hints. The diagrams are what you sketch on the board.
- 15 min
Clarify Requirements and Scope
GoalPin the scrape budget (50M/day, ~580/sec sustained), the alert latency SLO (<5s), and the fact that scraping is the bottleneck, not storage or alerting. Get the three scraping tiers on the board before drawing anything.
Do & Say- SAY·1SAY: This looks like a database problem but it's actually a scraping problem. 4,600 scrapes/sec is trivial for storage but hostile to retailers who will IP-ban you fast. Write SCRAPING IS THE BOTTLENECK on the board.
- SAY·2Pin the scale: 10M products × 5 checks/day = 50M scrapes/day, ~580 sustained, ~1,500 with headroom. Storage at 1KB/event: ~50 GB/day raw, ~1.8 TB/year, ~120 GB/year compressed.
- SAY·3Confirm alert latency SLO: Sub-5 second end-to-end from scrape commit to push notification on the user's phone. This is what justifies Flink CEP over database polling.
- SAY·4Park out of scope: account management UI, the browser-extension auto-fill, A/B testing of alert message copy. Stay focused on the price pipeline.
- SAY·5Establish the three scraping tiers up front: Tier 1 API (Amazon PA-API, Walmart Open API) for ~20% of products at near-zero cost, Tier 2 HTTP scrape for server-rendered pages, ~30%, Tier 3 Playwright headless browser for JS-rendered pages, ~50%. The tier mix determines the proxy budget.
Interviewer is grading: You reframe the problem (scraping, not storage) within the first 2 minutes. You write the three scraping tiers down before they ask. You don't promise per-minute price freshness; you negotiate it via the priority score in the deep dive.
- 25 min
API and Data Model
GoalDefine the user-facing tracking and alert APIs, the time-series hypertable, the alert rule shape. Justify TimescaleDB over plain PostgreSQL with the index-doesn't-fit-RAM math.
Do & Say- WRITE·1Write the user APIs: POST /products/track with retailer URL, GET /products/{id}/history with from/to/granularity, POST /alerts with type. Five alert types: price_below, percentage_drop, all_time_low, any_drop, back_in_stock. Mention them explicitly so the interviewer can't blindside you with what about back-in-stock?
- DRAW·2Sketch the price_history hypertable: (product_id, retailer_id, time, price_cents, currency, availability). Primary key (product_id, retailer_id, time). Hypertable on time with 7-day chunks.
- SAY·3Justify TimescaleDB out loud: 18B rows after a year. The B-tree index is ~200 GB, doesn't fit in RAM, every query hits disk. TimescaleDB partitions by time, compresses old chunks 15x, runs continuous aggregates for daily/weekly/monthly buckets. PostgreSQL alone would die at this scale. Cross plain PostgreSQL off.
- WATCH·4Cross InfluxDB off too: InfluxDB has weaker secondary indexing. We need to join price events with products and users for alert evaluation. TimescaleDB is still PostgreSQL underneath, so joins are first-class.
- SAY·5Continuous aggregates: price_daily, price_weekly, price_monthly materialized views, auto-refreshed. A 1-year chart reads from price_weekly (52 points), not from raw chunks (40K+ points).
- SAY·6Alert rule shape (PostgreSQL): (alert_id, user_id, product_id, retailer_id nullable, alert_type, target_price_cents, percentage_drop, snooze_until, is_active). Note: retailer_id is nullable, so users can alert on the best price across all retailers, not just one.
Interviewer is grading: You volunteer the index-doesn't-fit-RAM math without prompting. You name continuous aggregates instead of saying 'we'll add a cache.' You distinguish price_history (time series) from alerts (relational) and put them in the right stores.
- 310 min
High-Level Design
GoalOne picture showing the four sub-pipelines: scheduling, scraping, processing, alerting. Label arrows with throughput and label Kafka topics by name.
Draw on the boardDo & Say- DRAW·1Draw the four sub-pipelines left to right. Say: these are not microservices for the sake of microservices. Scheduling, scraping, and alerting all have different failure modes and scaling axes.
- SAY·2Scheduling: Adaptive Scheduler computes a priority score from alert_count, watcher_count, volatility, last_price_change_at. Maps the score to a check interval: 30 min for hot products, 7 days for dormant ones. Enforces the 50M/day budget by scaling down low-priority intervals if the ideal sum exceeds budget.
- SAY·3Scraping: Three tiers in one router: Tier 1 retailer APIs, near zero cost, Tier 2 HTTP for server-rendered pages, Tier 3 Playwright pool for JS pages. Per-retailer session affinity: each residential IP stays with a retailer 5-15 min, then rotates.
- SAY·4Per-retailer parser registry: Each parser has a version, tries multiple extraction strategies in order (CSS selector, JSON-LD, regex fallback), emits health metrics. Success rate drops below 90% triggers a page and stops scraping that retailer to preserve proxy quota.
- SAY·5Anomaly validator: Range check against the last known price (no 80% drops without corroboration), format check (currency, decimal places). Anomalous prices go to a review queue, not into TimescaleDB. This is what catches scraping bugs before they fire false alerts.
- SAY·6Alert pipeline: Flink CEP keyed by product_id holds last_price, all_time_low, last_availability per product in RocksDB state. Alert rules are a broadcast stream. Each price event evaluates against all active rules for that product. Triggers go to alert.triggers Kafka topic, dispatcher fans out to FCM/APNs/SES.
- SAY·7Why Flink, not DB poll: At 580 events/sec with 100M alerts, DB polling is 580 queries/sec at 50-200ms each. Flink with RocksDB state hits <10ms per event and handles patterns (percentage drop over window) without custom code. Exactly-once on failover.
Interviewer is grading: Your diagram has four sub-pipelines, not one blob. You label Kafka topics by name. You name Flink CEP and defend it with the DB-poll math, not just by saying 'streaming is better.'
- 415 min
Deep Dive: Anti-Bot Defense, Adaptive Frequency, Parser Resilience
GoalThe three places this design lives or dies in production. The interviewer will push on at least one of these and you should volunteer the others.
Draw on the boardDo & Say- SAY·1Anti-bot defense first, this is what kills naive designs: Residential IP pool (10K IPs, no datacenter), per-retailer session affinity (5-15 min then rotate), browser fingerprint with webdriver hidden, plugins spoofed, locale set, networkidle wait + 0.5-2s random delay, 2captcha for CAPTCHA paths.
- SAY·2Per-retailer session limits: Amazon ~5 requests per session before its risk score climbs, Walmart 8, Target 6, less aggressive retailers 15. These are tuned per retailer based on block-rate observation, not guessed.
- SAY·3Pivot to adaptive frequency. We do not check every product every hour. That would be 240M scrapes/day, 4x our budget. The Adaptive Scheduler computes a priority score from four factors. Draw the score: alert_count 0-40, volatility 0-30, watchers 0-15, recency 0-15.
- SAY·4Score to interval: Score 80-100 → every 30 minutes (hot products with active alerts and volatility), Score 0-19 → every 7 days (dormant, no alerts, stable price). The mapping is deliberately coarse so the schedule is rehearsable and predictable.
- SAY·5Budget enforcement: If the sum of ideal_daily_checks exceeds 50M, sort by priority descending and scale down the low-priority tail until it fits. Headroom of ~45% of the budget is intentional for retries, Black Friday bursts, and growth.
- SAY·6Pivot to parser resilience. The most common production incident is a retailer redesigns their page. All Amazon parsers break at once. Mitigation: per-retailer parser modules (one redesign affects one parser, not all of them), multiple extraction strategies per parser (CSS selector first, JSON-LD second, regex fallback).
- SAY·7Parser health: Each parser tracks success_count, failure_count, last_success. If failure rate exceeds 10% over the last 100 attempts, page on-call, stop scraping that retailer (to preserve proxy quota), queue affected products for re-scrape once the parser ships. The dashboard is the on-call's first stop during an incident.
- SAY·8Alert-storm question: Black Friday, 100 watchlist products all drop 30%, user gets 100 pushes and uninstalls. Mitigation: per-user rate limit (10 alerts/hour default), aggregate into single 5 deals from your watchlist push, snooze controls per rule.
- SAY·9Anomaly detection: Validator compares new price to last known for that product-retailer pair. Drops >50% go to a review queue (usually wrong-currency parse or clearance glitch). Don't want iPhone for $20 alerts because someone parsed cents.
Interviewer is grading: You volunteer the proxy session-affinity detail without being asked (this is the giveaway that you've actually built a scraper). You don't promise uniform check frequencies for all products. You name parser health as the recurring production incident and have a runbook (page, stop, fix, backfill).
- 55 min
Failure Modes, Trade-offs, and Wrap-up
GoalDefend the freshness vs scrape-budget trade-off, name the cascading failure (retailer goes dark), close in one sentence.
Do & Say- SAY·1Major retailer goes dark: Amazon Bot Manager flags us, success drops to ~10%. Parser health dashboard pages on-call. Mitigation: pause Amazon scraping 6h, rotate to a new residential proxy, refresh fingerprints. App shows last updated 4h ago badge. Amazon alerts don't fire, but we don't fire false ones either.
- SAY·2TimescaleDB write outage: Kafka buffers price.events with 3-day retention. Flink CEP keeps running on whatever's in Kafka, so alerts still fire. Once TimescaleDB recovers, the consumer catches up. Historical charts go stale, but the live alert path doesn't.
- SAY·3Flink CEP outage: Alerts stop firing. This is the worst-case for the product. RocksDB state is checkpointed every 60s so we restart from the last checkpoint. Worst-case alert latency during recovery: 1 to 2 minutes. We accept that because the alternative (DB polling) doesn't scale.
- SAY·4Trade-off 1: scraping budget vs freshness. We can buy more scrapes by raising the budget, but the marginal cost is dominated by Tier 3 Playwright + residential proxy cost. 30-min freshness for the top 200K products is the sweet spot.
- SAY·5Trade-off 2: false alerts vs missed alerts. Strict anomaly validation means we sometimes miss real price drops (catastrophic-looking but actually correct). Loose validation means we sometimes fire false alerts. We prefer missed alerts because false alerts erode trust faster.
- SAY·6Close in one breath: Adaptive Scheduler distributes a 50M/day budget across 10M products by priority, three-tier scrape router (API + HTTP + Playwright) with residential proxy rotation, TimescaleDB for price history, Flink CEP for sub-5s alerts against 100M rules, Kafka as backbone.
- SAY·7If there's time, offer: the proxy session-affinity logic, the continuous-aggregate refresh policy, the Flink broadcast-state pattern for alert rule propagation.
Interviewer is grading: You frame 'scrape less' as a deliberate product decision, not a limitation. You name false-alert tolerance as the harder trade-off (it is). You don't pretend a retailer-blocks-us-entirely scenario is recoverable in minutes.
Interview Grading by Level
What an interviewer at each level expects to see in your answer. Use this to calibrate, not to perform.
Mid-Level Engineer (L4 / SDE-II)
Builds the core scrape-store-alert loop but treats scraping as 'just HTTP GET' and underestimates anti-bot defenses.
- Splits the system into a scraper, a database for price history, and an alert worker that compares new prices to user targets.
- Picks a time-series database (TimescaleDB or InfluxDB) over plain PostgreSQL when prompted.
- Uses a Kafka queue between the scraper and the alert engine for asynchronous evaluation.
- Stores active alerts in Redis sorted sets keyed by product_id for fast lookup on each price event.
- Recognizes that scraping every product at the same frequency wastes resources, even if the algorithm is hand-wavy.
- Says 'we'll use proxies' but can't explain session affinity, rotation cadence, or why datacenter IPs are useless.
- Has no answer for what happens when a retailer redesigns their page and parsers break en masse.
- Treats alert evaluation as 'iterate through user alerts on every price event' without thinking about Flink/state.
- Doesn't size the storage growth (18B rows/year) or address why indexes won't fit in RAM at scale.
- Says 'we'll filter weird prices' but no anomaly threshold, no review queue, no defense against the iPhone-for-$20 alert.
Senior Engineer (L5 / SDE-III)
Drives the four-pipeline split, picks Flink CEP with reasons, designs adaptive scrape frequency, defends TimescaleDB with the right math.
- Frames scraping as the bottleneck within the first 2 minutes and writes the three scraping tiers (API, HTTP, browser) on the board.
- Picks TimescaleDB over PostgreSQL with the 18B rows / 200 GB index math, and adds continuous aggregates for chart queries.
- Designs an adaptive scheduler with a priority score (alert_count, volatility, watchers, recency) mapped to discrete check intervals.
- Picks Flink CEP over DB polling and quotes the latency math (<10ms vs 50-200ms per event) plus exactly-once semantics on failover.
- Names per-retailer parser modules with multiple extraction strategies and parser health monitoring as the recurring production incident.
- Anti-bot defense layered: residential IP pool, per-retailer session affinity 5-15 min, browser fingerprint spoofing, randomized delays.
- Anomaly validator with a review queue for >50% drops, defends 'prefer missed alerts to false alerts' as a trust trade-off.
- Doesn't volunteer the alert-storm Black Friday problem; only proposes per-user rate limiting after the interviewer surfaces it.
- Quotes 'we use proxies' but only explains session affinity when pushed; misses the per-retailer session-length tuning detail.
- Discusses continuous aggregates but doesn't size the refresh cadence (hourly, daily, weekly) or quote the chart-query reduction.
Staff+ Engineer (L6+)
Treats price-tracker as a scraping platform problem married to a streaming alert engine, frames the trust trade-offs explicitly, and brings real operational artifacts.
- Volunteers the parser-health dashboard as the on-call's first stop and treats parser breakage as a recurring SLA issue, not a one-off.
- Frames 'false alerts erode trust faster than missed alerts' as the load-bearing trade-off, defends strict anomaly validation on that basis.
- Brings up the major-retailer-goes-dark scenario unprompted, with a concrete recovery playbook (pause 6 hours, new proxy provider, refresh fingerprints, stale badge in app).
- Pushes back on requirements: 'do all 10M products really need price tracking, or can we start with the long tail at 7-day freshness and expand?'
- Names alert aggregation (5 deals from your watchlist) and snooze controls as part of the alerting design, not just a UX afterthought.
- Quantifies the scrape-budget trade-off: 'the marginal cost of more freshness is Tier 3 Playwright + residential proxy cost, which is the dominant infra line item.'
- Closes with a one-sentence summary and an explicit list of what they'd cover with more time (multi-currency normalization, retailer-side affiliate-API negotiation, proxy provider selection).
Common Follow-up Questions
click to expandQuestions an interviewer is likely to ask after your walkthrough. Rehearse the short answer.
Foundations Referenced
Detailed Solution Coming Soon
Full walkthrough coming soon. Stay tuned!