API Design at Staff Level
Boring APIs Win
The best APIs are boring. They do exactly what the name says, they handle errors consistently, and they have not had a breaking change in three years. Nobody writes a blog post about an API that just works. That is the point.
At Staff level, the interviewer is not testing whether you know what REST is. They want to see if you can design an API that 200 teams can integrate with and that your grandchildren will not have to deprecate. The bar is durability, not cleverness.
Resource Modeling That Scales
Start with nouns. A payment system has payments, refunds, customers, and payment methods. Each is a resource with its own lifecycle. /payments/{id} retrieves a payment. /payments/{id}/refunds creates a refund scoped to that payment. This nesting communicates ownership without requiring documentation.
Where candidates go wrong is modeling around internal implementation. If your payment service internally splits a transaction into an authorization and a capture, that does not mean your public API needs /authorizations and /captures as top-level resources. Expose the abstraction your consumers care about, not the plumbing they do not.
Keep URLs shallow. Two levels of nesting is the practical limit. /customers/{id}/payment-methods/{id}/transactions is a signal that your resource model needs rethinking. Promote transactions to a top-level resource with a customer_id filter instead.
The 45-Minute API Design Interview
Here is a time allocation that reliably produces strong answers.
Minutes 0-10: Clarify requirements. Who are the consumers? External developers, internal services, or both? What are the SLA expectations? Is this a synchronous request-response API or does it need webhooks for async operations? For a payment API, you need to establish whether you are designing a Stripe-like platform or an internal service.
Minutes 10-25: Design the resource model. Sketch the core resources, their relationships, and their state machines. A payment has states: created, processing, succeeded, failed. Define what transitions are valid. Then write out 5-7 key endpoints with request and response shapes. Do not try to be exhaustive. Focus on the endpoints that reveal interesting design decisions.
Minutes 25-35: Define the hard parts. Error responses, pagination, authentication, and idempotency. Pick Stripe's error format or build something similar. Explain your pagination strategy (cursor-based with a default page size of 25, max of 100). Specify that all mutating endpoints accept an Idempotency-Key header.
Minutes 35-45: Trade-offs and evolution. How will this API change over time? Discuss your versioning strategy. Date-based versions (like Stripe's 2023-10-16 format) let you evolve continuously without forcing consumers onto a new "v2." Explain how you would deprecate a field: mark it deprecated in the schema, stop documenting it, log usage, notify consumers, then remove it after the deprecation window (typically 12-18 months for a public API).
Backwards Compatibility as a Discipline
Additive changes are always safe. New fields in a response, new optional query parameters, new endpoints. None of these break existing consumers.
Everything else requires a deprecation process. Stripe solves this by maintaining compatibility layers for every API version they have ever shipped. When you pass a version header, the backend transforms the current internal response into the shape your version expects. Expensive to maintain, but no consumer ever breaks unexpectedly. For most companies, a simpler approach works: support two versions simultaneously, give consumers a 6-month migration window, and provide automated migration tooling.
GraphQL vs REST vs gRPC: An Honest Framework
REST is the default for public APIs. HTTP caching works natively. Every language has an HTTP client. Tooling is mature. The downside is over-fetching (you get the whole resource even if you need two fields) and the N+1 request problem for complex data requirements.
GraphQL solves the over-fetching problem, which makes it excellent for mobile clients on unreliable networks. Shopify, GitHub, and Yelp use it for their public APIs. But the trade-offs are real: query complexity attacks require depth limiting, caching requires custom infrastructure (Apollo Server, Persisted Queries), and error handling is unintuitive because GraphQL always returns 200 with errors nested in the response body.
gRPC is purpose-built for service-to-service communication. Binary protobuf serialization is 5-10x faster than JSON parsing. But browser support requires grpc-web, and debugging is harder because you cannot just curl an endpoint. Google, Netflix, and Square use it extensively for internal services while exposing REST or GraphQL at the edge.
The Staff-level answer is almost never "just pick one." It is usually "REST at the public edge, gRPC between internal services, and GraphQL for the mobile BFF layer where query flexibility matters most."
Sample Questions
Design a public API for a payment processing system. Walk through resource modeling, error handling, and versioning.
Payment APIs surface every hard design problem at once: idempotency, partial failures, state machines, and security constraints. Interviewers use this domain because weak answers reveal themselves quickly through missing edge cases.
You need to make a breaking change that affects 200 API consumers. How do you handle the migration?
This tests whether you understand the organizational and technical cost of breaking changes. Strong answers discuss deprecation windows, migration tooling, and how to measure consumer readiness before cutting over.
For a new product that serves both a mobile app and internal microservices, would you choose REST, GraphQL, or gRPC? Defend your choice.
Protocol selection questions are less about the 'right' answer and more about demonstrating that you understand the trade-offs across dimensions like caching, tooling maturity, client complexity, and operational cost.
Evaluation Criteria
- Designs resources around domain nouns, not implementation verbs, and explains naming decisions
- Discusses pagination, filtering, and error handling with specific patterns rather than hand-waving
- Demonstrates a concrete versioning strategy and explains how backwards compatibility is maintained
- Makes a reasoned protocol choice (REST/GraphQL/gRPC) tied to specific requirements, not personal preference
- Addresses authentication, rate limiting, and idempotency as first-class design concerns
Key Points
- •Stripe has not made a breaking API change since 2011. That is not luck. It is a discipline of additive-only changes, date-based API versioning, and aggressive internal testing against every supported version simultaneously.
- •Cursor-based pagination is not a preference. It is a requirement at scale. Offset pagination breaks when the underlying dataset changes between requests, which means page 5 might skip records or show duplicates. Slack, Twitter, and Facebook all migrated to cursor pagination after hitting this wall.
- •The error response is the most-read part of your API documentation. Stripe's error object (type, code, message, param, doc_url) became an industry template because it gives developers everything they need to fix the problem without leaving their terminal.
- •Rate limiting transparency separates professional APIs from amateur ones. Return X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers on every response. Developers should never have to guess when they can retry.
- •Idempotency keys are not optional for any endpoint that creates resources or triggers side effects. Without them, a network timeout on a payment request becomes a double charge.
Common Mistakes
- ✗Naming endpoints as actions instead of resources. /createUser and /getUser are RPC-style thinking wearing a REST costume. Use /users with HTTP methods to express the operation.
- ✗Designing error responses as an afterthought. If your 400 response just says 'Bad Request' with no detail about which field failed validation and why, you have guaranteed a support ticket for every integration.
- ✗Treating versioning as a future problem. By the time you need it, you have already shipped a v1 that 50 consumers depend on, and retrofitting versioning into an unversioned API is a nightmare that touches every client.
- ✗Choosing GraphQL because it is modern without accounting for the caching complexity. HTTP caching works out of the box with REST. With GraphQL, every request is a POST to the same endpoint, which means your CDN is useless without custom cache key logic.