API Performance Metrics
Why Averages Lie
If your API has an average latency of 200ms, that sounds fine. But averages are dangerous. If 95% of requests complete in 100ms and 5% take 2,100ms, your average is still 200ms. Those 5% of users are having a terrible experience, and you'd never know from the average.
This is why percentile-based measurement is standard practice. Track P50 (median, the typical experience), P95 (the experience of 1 in 20 users), and P99 (the experience of 1 in 100 users). For high-traffic APIs, P99.9 matters too. At 10 million requests per day, P99.9 still means 10,000 requests hitting that worst-case latency.
Defining Your SLIs
A good API SLI combines latency and availability into something measurable. A common pattern:
- Latency SLI: proportion of requests where P95 latency < 300ms, measured over a rolling 5-minute window
- Error rate SLI: proportion of requests returning non-5xx responses, measured over a rolling 5-minute window
Set your SLO as a target for each SLI. For example: "99.9% of 5-minute windows will have P95 latency under 300ms." That translates to roughly 43 minutes of budget per month where you're allowed to miss. The error budget model from SRE practice makes this actionable: when budget runs low, freeze feature releases and focus on reliability.
Performance Budgets Across the Call Chain
An API that calls three downstream services needs a latency budget for each hop. If your edge SLO is 500ms P95, and you have a gateway (20ms), an application server (processing + DB query), and an external API call, you need to allocate that 500ms across the chain.
Instrument each segment with distributed tracing (OpenTelemetry, Jaeger, Datadog APM). When total latency exceeds budget, trace data tells you which segment is responsible. Without this breakdown, debugging latency regressions becomes guesswork.
Real User Monitoring vs Synthetic
Synthetic monitoring (scheduled probes from fixed locations) gives you consistent baselines and catches outages fast. But it misses what real users experience. A user on a mobile connection in Mumbai has a very different experience than a synthetic check running from the same AWS region as your servers.
Real User Monitoring (RUM) captures actual user-side timings. It reveals geographic latency variance, device performance differences, and the real impact of network conditions. The tradeoff is data volume and noise. Use synthetic for alerting and SLO tracking. Use RUM for understanding the full distribution of user experience and identifying where to invest in performance improvements.
Apdex as a Communication Tool
Apdex (Application Performance Index) converts latency into a 0-to-1 score. You define a target threshold T (say, 300ms). Requests under T are "satisfied," requests between T and 4T are "tolerating," requests over 4T are "frustrated." The formula: (satisfied + tolerating/2) / total.
An Apdex of 0.95 is excellent. Below 0.85 and users are noticing. Below 0.70 and you have a real problem. The value of Apdex is that it's a single number you can put on an executive dashboard without explaining percentiles. Engineers should still look at the underlying percentile data for debugging, but Apdex bridges the gap to non-technical stakeholders.
Key Points
- •P99 latency matters more than averages because averages hide tail latency that affects real users
- •Apdex scores translate raw latency into a 0-1 satisfaction index that non-engineers can understand
- •Performance budgets should be allocated across the call chain, not just set at the edge
- •SLIs for APIs typically combine latency (P95 < threshold) and error rate (< threshold) into a composite
- •Real User Monitoring captures what synthetic checks miss: geographic variance, device differences, network conditions
Common Mistakes
- ✗Reporting average latency instead of percentiles, which hides the experience of your worst-affected users
- ✗Setting latency SLOs without measuring the actual user-facing call chain end to end
- ✗Monitoring only from a single region and missing latency problems that affect users in other geographies
- ✗Ignoring throughput changes when analyzing latency, since latency often degrades under load