Network Latency — Where Time Goes
Every millisecond in a request comes from somewhere — DNS, TCP, TLS, server processing, or transfer — and each one can be measured and optimized.
The Problem
Why does a simple API call take 300ms when the server processes it in 5ms? Where do the other 295ms go, and how does an engineering team systematically eliminate them?
Mental Model
Like a relay race — the total time is the sum of every handoff. The runners can't go faster (speed of light), but the number of handoffs can be reduced.
Architecture Diagram
How It Works
Every time a browser loads a page or a service calls an API, the request passes through a gauntlet of sequential steps. Each step adds latency, and they compound. Understanding where time goes is the first step to eliminating it.
The Anatomy of an HTTPS Request
A cold HTTPS request — no cached DNS, no existing connection — goes through five distinct phases:
1. DNS Resolution (20-120ms)
The browser asks "what's the IP address of api.example.com?" This query traverses the local cache, the OS resolver, the ISP's recursive resolver, and potentially the authoritative nameserver. A cold lookup easily takes 50-120ms. Subsequent lookups hit cache and resolve in under 5ms.
2. TCP Handshake (1 RTT)
The three-way handshake: SYN → SYN-ACK → ACK. This costs exactly one round-trip time. New York to London is ~75ms RTT, so that's 75ms before a single byte of application data.
3. TLS Handshake (1-2 RTTs)
TLS 1.2 requires two round-trips: ClientHello/ServerHello, then key exchange. TLS 1.3 reduces this to one round-trip. With 0-RTT resumption (TLS 1.3 + previously established session), the client can send data immediately — though 0-RTT has replay attack considerations.
4. Time to First Byte — TTFB (variable)
The server receives the request, processes it (database queries, computation, serialization), and sends back the first byte. This is the application's "think time." A well-optimized API should have TTFB under 50ms; a page rendering with database queries might take 200-500ms.
5. Content Transfer (size / bandwidth)
The response body downloads. For a 50KB JSON API response on a broadband connection, this is negligible. For a 5MB image on a 3G mobile network, this dominates.
RTT by Geography: The Speed of Light Tax
Light in fiber travels at roughly 200,000 km/s — about two-thirds of vacuum speed due to the refractive index of glass. This sets a hard physical floor that no amount of engineering can break.
| Route | Distance | Theoretical Min RTT | Typical RTT |
|---|---|---|---|
| US East ↔ US West | ~4,000 km | ~40ms | 60-80ms |
| US East ↔ Europe (London) | ~5,500 km | ~55ms | 70-90ms |
| US East ↔ Asia (Tokyo) | ~11,000 km | ~110ms | 150-200ms |
| US West ↔ Asia (Tokyo) | ~8,500 km | ~85ms | 100-140ms |
| Europe ↔ Asia (Singapore) | ~10,000 km | ~100ms | 150-250ms |
| Same region (same AZ) | <100 km | <1ms | 0.5-2ms |
The "typical RTT" column is higher than theoretical because real packets traverse routers, switches, and non-straight-line fiber paths. Submarine cables don't follow great circles — they follow coastlines and landing stations.
Bandwidth vs. Latency: The Pipe Analogy
These two concepts are fundamentally different, and confusing them is one of the most common mistakes in network optimization.
Bandwidth is the width of the pipe — how much data can be pushed through per second. Going from 100 Mbps to 1 Gbps enables transferring large files 10x faster.
Latency is the length of the pipe — how long it takes for a single bit to travel from source to destination. No amount of bandwidth reduces latency.
For small payloads (API calls, web pages), latency dominates. A 1KB response on a 150ms RTT link takes ~300ms total regardless of whether the pipe is 10 Mbps or 10 Gbps. For large payloads (video streaming, file downloads), bandwidth matters more.
# Measure each phase of an HTTPS request
curl -w "\
DNS: %{time_namelookup}s\n\
TCP: %{time_connect}s\n\
TLS: %{time_appconnect}s\n\
TTFB: %{time_starttransfer}s\n\
Total: %{time_total}s\n\
Download: %{size_download} bytes\n" \
-o /dev/null -s https://api.example.com/data
Optimization Techniques
Reducing latency falls into three categories: reduce round-trips, reduce distance, and reduce processing time.
Reduce Round-Trips
Connection reuse is the single biggest win. HTTP keep-alive reuses TCP connections across requests, eliminating the handshake cost. HTTP/2 multiplexes hundreds of requests over one connection. HTTP/3 eliminates the TCP handshake entirely with QUIC's 0-RTT.
<!-- DNS prefetch: resolve DNS for domains needed soon -->
<link rel="dns-prefetch" href="//api.example.com">
<!-- Preconnect: establish TCP + TLS ahead of time -->
<link rel="preconnect" href="https://cdn.example.com">
Connection pooling in backend services (database pools, HTTP client pools) avoids creating new TCP+TLS connections per request. A PostgreSQL connection takes 3-10ms to establish — with a pool, it's effectively zero.
Reduce Distance
CDN edge caching puts static content on servers near users. Cloudflare, Akamai, and CloudFront have 200-300+ PoPs globally. A cache hit at the edge means the request never touches the origin server — response time drops from 200ms to 20ms.
Regional deployments put application servers closer to users. If most of the user base is in Europe, running only in US-East adds 70-90ms to every request. Multi-region deployment with a global load balancer solves this.
Edge compute (Cloudflare Workers, Lambda@Edge) runs application code at the edge, not just static caching. An A/B test decision that runs at the edge saves a round-trip to origin.
Reduce Processing Time
Server-side optimization — faster database queries, response caching, efficient serialization — directly reduces TTFB. Profile the hot paths. A slow SQL query adding 200ms to TTFB dwarfs any network optimization available.
Compression (gzip, Brotli, zstd) reduces content transfer time. Brotli typically achieves 15-25% better compression than gzip for text content. The CPU cost is negligible compared to the transfer time savings.
# Nginx Brotli compression
brotli on;
brotli_types text/html text/css application/javascript application/json;
brotli_comp_level 6;
Reduce Payload Size
Less data means less transfer time. This sounds obvious, but it's routinely ignored:
- Return only the fields the client needs (GraphQL shines here)
- Paginate large responses instead of dumping everything
- Use efficient serialization (Protocol Buffers vs JSON can be 3-10x smaller)
- Optimize images: WebP/AVIF instead of PNG/JPEG, proper sizing, lazy loading
Measuring Latency in Production
Lab measurements (Lighthouse, WebPageTest) reveal what's possible. Real User Monitoring (RUM) reveals what's actually happening.
Key Metrics to Track
| Metric | What It Reveals | Target |
|---|---|---|
| P50 Latency | Typical user experience | <200ms for APIs |
| P95 Latency | Experience for 1-in-20 users | <500ms for APIs |
| P99 Latency | Worst common case | <1s for APIs |
| TTFB | Server processing + network to first byte | <100ms at edge |
| DNS Time | Resolution overhead | <20ms (cached) |
The P99 matters more than P50 in distributed systems. If service A calls service B calls service C, and each has a P99 of 200ms, the combined P99 for the chain is much worse than 600ms due to probability compounding.
Synthetic vs Real User Monitoring
Synthetic monitoring (Catchpoint, Pingdom) sends test requests from known locations on a schedule. Great for detecting regressions, useless for understanding real user experience across diverse networks and devices.
Real User Monitoring (Datadog RUM, Google Analytics, SpeedCurve) captures actual user timing data. This reveals the long tail — mobile users on 3G in rural areas experience 5-10x the latency of synthetic tests run from an AWS region.
Why This Matters
Latency is the silent killer of user experience and system reliability. Google's research shows users perceive delays above 100ms, and anything above 1 second breaks flow. In microservice architectures, latency compounds across service calls — a 50ms increase at one service cascades through the dependency graph.
The engineers who understand where time goes are the ones who build fast systems. It's not about clever tricks — it's about measuring each phase, identifying the dominant cost, and systematically eliminating unnecessary round-trips and distance.
Key Points
- •A cold HTTPS request from New York to London costs ~250ms minimum before a single byte of content arrives: DNS + TCP + TLS + TTFB
- •Bandwidth and latency are fundamentally different — a 10 Gbps pipe doesn't help if RTT is 150ms. Latency is about distance; bandwidth is about width
- •The speed of light in fiber is ~200,000 km/s (roughly 2/3 of vacuum speed), setting a hard physical floor on latency
- •TLS 1.3 reduced the handshake from 2 RTTs to 1 RTT (and 0-RTT for resumption), which is why upgrading from TLS 1.2 matters
- •Connection reuse (HTTP keep-alive, connection pooling) is the single most impactful latency optimization because it eliminates handshake costs entirely
Key Components
| Component | Role |
|---|---|
| DNS Resolution | Translates hostname to IP address — often 20-120ms, cached aggressively by browsers and OS resolvers |
| TCP Handshake | Three-way handshake (SYN, SYN-ACK, ACK) that costs exactly one round-trip before data can flow |
| TLS Handshake | Negotiates cipher suite and exchanges keys — adds 1-2 RTTs depending on TLS version (1.2 vs 1.3) |
| Time to First Byte (TTFB) | Server processing time from receiving the request to sending the first byte of the response |
| Content Transfer | Time to download the full response, governed by bandwidth, TCP congestion window, and response size |
When to Use
Latency analysis applies to every networked system. Prioritize it when user-facing response times exceed targets, when P99 latency is significantly higher than P50, or when expanding to new geographic regions.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| Chrome DevTools | Open Source | Waterfall breakdown of individual requests showing DNS, TCP, TLS, TTFB, and download phases | Development |
| WebPageTest | Open Source | Multi-location testing with filmstrip view and connection-level timing from real browsers | Development-Production |
| Lighthouse | Open Source | Automated performance auditing with actionable optimization suggestions | Development |
| Catchpoint | Commercial | Synthetic monitoring from 800+ global locations with network-layer telemetry | Enterprise |
Debug Checklist
- Open Chrome DevTools Network tab and check the Timing breakdown — identify which phase (DNS, TCP, TLS, TTFB, download) dominates
- Run curl -w with timing variables to measure each phase from the command line: dns_lookup, tcp_connect, tls_handshake, starttransfer, total
- Check if connections are being reused — look for Connection: keep-alive headers and verify connection pooling in the HTTP client
- Test from multiple geographic locations using WebPageTest or Catchpoint to isolate distance-related latency from server-side latency
- Verify TLS version with openssl s_client — confirm TLS 1.3 is negotiated, not 1.2
Common Mistakes
- Optimizing bandwidth when latency is the bottleneck. A 1KB API response on a 100ms RTT link doesn't benefit from more bandwidth — the handshake overhead dominates
- Ignoring DNS resolution time. A cold DNS lookup to an authoritative server can add 50-200ms, and this happens before anything else
- Not enabling TLS 1.3. Sticking with TLS 1.2 adds an extra round-trip on every new connection — that's 50-150ms wasted per connection
- Measuring latency only from the data center. Real user latency includes last-mile ISP hops, which can add 10-50ms of jitter
- Assuming CDN solves everything. CDNs help with static content but dynamic API calls still hit origin servers — latency there is server think-time
Real World Usage
- •Google found that an extra 500ms of latency reduced search traffic by 20% — latency directly impacts revenue
- •Amazon calculated that every 100ms of added latency costs 1% in sales — this drove their global edge infrastructure investment
- •Cloudflare's Anycast network routes users to the nearest PoP, reducing RTT to under 20ms for most of the world's population
- •High-frequency trading firms pay millions for colocation and microwave links to shave microseconds off latency
- •Mobile networks add 50-300ms of latency from radio access network overhead, which is why mobile-first optimization matters