AI/ML System Design Interview

How AI System Design Interviews Differ

Regular system design interviews are about building systems with predictable behavior. You design a URL shortener, a chat app, a rate limiter. Inputs map to outputs in a way you can reason about. If you implement it correctly, it works correctly. Every time.

AI system design interviews throw that assumption out the window. The system you're designing will sometimes be wrong. Confidently wrong. Wrong in ways you can't easily predict or reproduce. Your job is to design around that fundamental uncertainty, and that single fact changes how you approach everything.

The interviewer doesn't care whether you can explain transformers or walk through backpropagation. They want to know if you can architect a production system where a probabilistic component sits at the center, and everything else handles the mess that creates.

The AI System Design Framework

Before you draw a single box on the whiteboard, ask yourself four questions:

What metric defines success? "Better recommendations" is not a metric. Click-through rate, conversion rate, day-7 user retention, revenue per session. Pick one primary metric and be explicit about the secondary ones you're sacrificing.

What error rate is tolerable? A content moderation system with 1% false negatives means 1 in 100 harmful posts gets through. At 10,000 posts per minute, that's 100 harmful posts per minute reaching users. Is that acceptable? This question forces you to think about business context, not just model performance numbers on a dashboard.

What latency budget do you have? A recommendation system that takes 3 seconds to respond kills the user experience. A fraud detection system that takes 3 seconds might be perfectly fine if it runs asynchronously. Your latency constraints shape your entire architecture: whether you can call an LLM at all, whether you need model distillation, whether you need a tiered approach.

What's the cost envelope? If you're sending 100 million requests per day through GPT-4, your inference bill alone could blow past $500K per month. Most interviewers want to see that you understand this reality and can design systems that use expensive models surgically, not universally.

Common AI System Design Questions

Most AI system design questions fall into three buckets. Recognizing which bucket you're in helps you structure your answer faster.

Content understanding systems (moderation, classification, extraction) follow a multi-tier pattern. Fast, cheap models handle the obvious cases. Medium-cost models handle the middle ground. Expensive LLMs handle the ambiguous tail. Human reviewers handle what the LLMs can't confidently classify. The real design challenge is setting the right confidence thresholds at each tier and building feedback loops so the cheaper tiers get better over time.

Recommendation and ranking systems require thinking about the full pipeline: data collection, feature engineering, candidate generation (narrowing millions of items to hundreds), ranking (ordering the hundreds), and serving. The interesting design decisions live in the feature store architecture, the split between online and offline features, and the A/B testing infrastructure that lets you measure whether your changes actually move the metric you care about.

Generative systems (chatbots, content creation, code generation) center on retrieval-augmented generation, output quality control, and cost management. Here's the thing most people miss: the retrieval layer matters more than the model in most cases. A mediocre model with excellent retrieval beats a frontier model with no context. Put the same care into your knowledge base, chunking strategy, and retrieval pipeline that you'd put into model selection.

The Data Flywheel

This is the concept that separates a demo from a product. Every interaction with your AI system generates data: user clicks, corrections, dismissals, escalations to humans. If you design the system to capture this signal and feed it back into training, evaluation, and threshold tuning, your system improves over time without anyone manually intervening.

The flywheel has four stages. Users interact with the system. The system logs predictions alongside outcomes. An evaluation pipeline compares predictions to actual results. Insights from evaluation feed into model retraining, threshold adjustment, and retrieval improvements.

When an interviewer asks about continuous improvement, they're really asking about the flywheel. When they ask about monitoring, they're partly asking whether you've designed the system to detect when the flywheel breaks down, when the data distribution shifts and your model starts operating on inputs it wasn't trained for.

Answering the Tradeoff Questions

Every AI system design interview reaches a point where the interviewer pushes on tradeoffs. "What if the latency requirement drops to 50ms?" "What if you need to cut costs by 80%?" "What if accuracy needs to go from 95% to 99.5%?"

These moments determine your score. Don't panic and start redesigning from scratch. Instead, work the cost-quality-latency triangle explicitly.

To reduce latency: use smaller models, pre-compute where possible, add caching layers, move to model distillation, accept lower accuracy on the tail.

To reduce cost: batch requests, use tiered model selection, cache frequent queries, reduce the percentage of requests hitting expensive models, accept slightly lower quality.

To increase quality: add human review for low-confidence predictions, use ensemble approaches, invest in better training data, add more retrieval context, accept higher latency and cost.

The best answers put numbers on these tradeoffs. "Switching from GPT-4 to a fine-tuned GPT-3.5 would reduce our per-request cost from $0.03 to $0.002, cut latency from 800ms to 200ms, and based on our eval set, drop accuracy from 94% to 89%. For the 70% of requests that are straightforward classifications, that's an acceptable trade." That level of specificity tells the interviewer you've built real systems, not just read about them.

How AI System Design Interviews Differ

The AI System Design Framework

Before you draw a single box on the whiteboard, ask yourself four questions:

Common AI System Design Questions

Most AI system design questions fall into three buckets. Recognizing which bucket you're in helps you structure your answer faster.

The Data Flywheel

Answering the Tradeoff Questions

These moments determine your score. Don't panic and start redesigning from scratch. Instead, work the cost-quality-latency triangle explicitly.

To reduce latency: use smaller models, pre-compute where possible, add caching layers, move to model distillation, accept lower accuracy on the tail.

To reduce cost: batch requests, use tiered model selection, cache frequent queries, reduce the percentage of requests hitting expensive models, accept slightly lower quality.

To increase quality: add human review for low-confidence predictions, use ensemble approaches, invest in better training data, add more retrieval context, accept higher latency and cost.

How AI System Design Interviews Differ

The AI System Design Framework

Common AI System Design Questions

The Data Flywheel

Answering the Tradeoff Questions

Sample Questions

Evaluation Criteria

Key Points

Common Mistakes

Related Topics

AI/ML System Design Interview

How AI System Design Interviews Differ

The AI System Design Framework

Common AI System Design Questions

The Data Flywheel

Answering the Tradeoff Questions

Sample Questions

Evaluation Criteria

Key Points

Common Mistakes

Related Topics