Performance & ObservabilityTopic 3 of 6

Performance & ObservabilityIntermediate

Network Latency — Where Time Goes

TCPTLSHTTPDNS

Every millisecond in a request comes from somewhere — DNS, TCP, TLS, server processing, or transfer — and each one can be measured and optimized.

The Problem

Why does a simple API call take 300ms when the server processes it in 5ms? Where do the other 295ms go, and how does an engineering team systematically eliminate them?

Mental Model

Like a relay race — the total time is the sum of every handoff. The runners can't go faster (speed of light), but the number of handoffs can be reduced.

Architecture Diagram

How It Works

Every time a browser loads a page or a service calls an API, the request passes through a gauntlet of sequential steps. Each step adds latency, and they compound. Understanding where time goes is the first step to eliminating it.

The Anatomy of an HTTPS Request

A cold HTTPS request — no cached DNS, no existing connection — goes through five distinct phases:

1. DNS Resolution (20-120ms)

The browser asks "what's the IP address of api.example.com?" This query traverses the local cache, the OS resolver, the ISP's recursive resolver, and potentially the authoritative nameserver. A cold lookup easily takes 50-120ms. Subsequent lookups hit cache and resolve in under 5ms.

2. TCP Handshake (1 RTT)

The three-way handshake: SYN → SYN-ACK → ACK. This costs exactly one round-trip time. New York to London is ~75ms RTT, so that's 75ms before a single byte of application data.

3. TLS Handshake (1-2 RTTs)

TLS 1.2 requires two round-trips: ClientHello/ServerHello, then key exchange. TLS 1.3 reduces this to one round-trip. With 0-RTT resumption (TLS 1.3 + previously established session), the client can send data immediately — though 0-RTT has replay attack considerations.

4. Time to First Byte — TTFB (variable)

The server receives the request, processes it (database queries, computation, serialization), and sends back the first byte. This is the application's "think time." A well-optimized API should have TTFB under 50ms; a page rendering with database queries might take 200-500ms.

5. Content Transfer (size / bandwidth)

The response body downloads. For a 50KB JSON API response on a broadband connection, this is negligible. For a 5MB image on a 3G mobile network, this dominates.

RTT by Geography: The Speed of Light Tax

Light in fiber travels at roughly 200,000 km/s — about two-thirds of vacuum speed due to the refractive index of glass. This sets a hard physical floor that no amount of engineering can break.

Route	Distance	Theoretical Min RTT	Typical RTT
US East ↔ US West	~4,000 km	~40ms	60-80ms
US East ↔ Europe (London)	~5,500 km	~55ms	70-90ms
US East ↔ Asia (Tokyo)	~11,000 km	~110ms	150-200ms
US West ↔ Asia (Tokyo)	~8,500 km	~85ms	100-140ms
Europe ↔ Asia (Singapore)	~10,000 km	~100ms	150-250ms
Same region (same AZ)	<100 km	<1ms	0.5-2ms

The "typical RTT" column is higher than theoretical because real packets traverse routers, switches, and non-straight-line fiber paths. Submarine cables don't follow great circles — they follow coastlines and landing stations.

Bandwidth vs. Latency: The Pipe Analogy

These two concepts are fundamentally different, and confusing them is one of the most common mistakes in network optimization.

Bandwidth is the width of the pipe — how much data can be pushed through per second. Going from 100 Mbps to 1 Gbps enables transferring large files 10x faster.

Latency is the length of the pipe — how long it takes for a single bit to travel from source to destination. No amount of bandwidth reduces latency.

For small payloads (API calls, web pages), latency dominates. A 1KB response on a 150ms RTT link takes ~300ms total regardless of whether the pipe is 10 Mbps or 10 Gbps. For large payloads (video streaming, file downloads), bandwidth matters more.

# Measure each phase of an HTTPS request
curl -w "\
  DNS:        %{time_namelookup}s\n\
  TCP:        %{time_connect}s\n\
  TLS:        %{time_appconnect}s\n\
  TTFB:       %{time_starttransfer}s\n\
  Total:      %{time_total}s\n\
  Download:   %{size_download} bytes\n" \
  -o /dev/null -s https://api.example.com/data

Optimization Techniques

Reducing latency falls into three categories: reduce round-trips, reduce distance, and reduce processing time.

Reduce Round-Trips

Connection reuse is the single biggest win. HTTP keep-alive reuses TCP connections across requests, eliminating the handshake cost. HTTP/2 multiplexes hundreds of requests over one connection. HTTP/3 eliminates the TCP handshake entirely with QUIC's 0-RTT.

<!-- DNS prefetch: resolve DNS for domains needed soon -->
<link rel="dns-prefetch" href="//api.example.com">

<!-- Preconnect: establish TCP + TLS ahead of time -->
<link rel="preconnect" href="https://cdn.example.com">

Connection pooling in backend services (database pools, HTTP client pools) avoids creating new TCP+TLS connections per request. A PostgreSQL connection takes 3-10ms to establish — with a pool, it's effectively zero.

Reduce Distance

CDN edge caching puts static content on servers near users. Cloudflare, Akamai, and CloudFront have 200-300+ PoPs globally. A cache hit at the edge means the request never touches the origin server — response time drops from 200ms to 20ms.

Regional deployments put application servers closer to users. If most of the user base is in Europe, running only in US-East adds 70-90ms to every request. Multi-region deployment with a global load balancer solves this.

Edge compute (Cloudflare Workers, Lambda@Edge) runs application code at the edge, not just static caching. An A/B test decision that runs at the edge saves a round-trip to origin.

Reduce Processing Time

Server-side optimization — faster database queries, response caching, efficient serialization — directly reduces TTFB. Profile the hot paths. A slow SQL query adding 200ms to TTFB dwarfs any network optimization available.

Compression (gzip, Brotli, zstd) reduces content transfer time. Brotli typically achieves 15-25% better compression than gzip for text content. The CPU cost is negligible compared to the transfer time savings.

# Nginx Brotli compression
brotli on;
brotli_types text/html text/css application/javascript application/json;
brotli_comp_level 6;

Reduce Payload Size

Less data means less transfer time. This sounds obvious, but it's routinely ignored:

Return only the fields the client needs (GraphQL shines here)
Paginate large responses instead of dumping everything
Use efficient serialization (Protocol Buffers vs JSON can be 3-10x smaller)
Optimize images: WebP/AVIF instead of PNG/JPEG, proper sizing, lazy loading

Measuring Latency in Production

Lab measurements (Lighthouse, WebPageTest) reveal what's possible. Real User Monitoring (RUM) reveals what's actually happening.

Key Metrics to Track

Metric	What It Reveals	Target
P50 Latency	Typical user experience	<200ms for APIs
P95 Latency	Experience for 1-in-20 users	<500ms for APIs
P99 Latency	Worst common case	<1s for APIs
TTFB	Server processing + network to first byte	<100ms at edge
DNS Time	Resolution overhead	<20ms (cached)

The P99 matters more than P50 in distributed systems. If service A calls service B calls service C, and each has a P99 of 200ms, the combined P99 for the chain is much worse than 600ms due to probability compounding.

Synthetic vs Real User Monitoring

Synthetic monitoring (Catchpoint, Pingdom) sends test requests from known locations on a schedule. Great for detecting regressions, useless for understanding real user experience across diverse networks and devices.

Real User Monitoring (Datadog RUM, Google Analytics, SpeedCurve) captures actual user timing data. This reveals the long tail — mobile users on 3G in rural areas experience 5-10x the latency of synthetic tests run from an AWS region.

Why This Matters

Latency is the silent killer of user experience and system reliability. Google's research shows users perceive delays above 100ms, and anything above 1 second breaks flow. In microservice architectures, latency compounds across service calls — a 50ms increase at one service cascades through the dependency graph.

The engineers who understand where time goes are the ones who build fast systems. It's not about clever tricks — it's about measuring each phase, identifying the dominant cost, and systematically eliminating unnecessary round-trips and distance.

Key Points

•A cold HTTPS request from New York to London costs ~250ms minimum before a single byte of content arrives: DNS + TCP + TLS + TTFB
•Bandwidth and latency are fundamentally different — a 10 Gbps pipe doesn't help if RTT is 150ms. Latency is about distance; bandwidth is about width
•The speed of light in fiber is ~200,000 km/s (roughly 2/3 of vacuum speed), setting a hard physical floor on latency
•TLS 1.3 reduced the handshake from 2 RTTs to 1 RTT (and 0-RTT for resumption), which is why upgrading from TLS 1.2 matters
•Connection reuse (HTTP keep-alive, connection pooling) is the single most impactful latency optimization because it eliminates handshake costs entirely

Key Components

Component	Role
DNS Resolution	Translates hostname to IP address — often 20-120ms, cached aggressively by browsers and OS resolvers
TCP Handshake	Three-way handshake (SYN, SYN-ACK, ACK) that costs exactly one round-trip before data can flow
TLS Handshake	Negotiates cipher suite and exchanges keys — adds 1-2 RTTs depending on TLS version (1.2 vs 1.3)
Time to First Byte (TTFB)	Server processing time from receiving the request to sending the first byte of the response
Content Transfer	Time to download the full response, governed by bandwidth, TCP congestion window, and response size

When to Use

Latency analysis applies to every networked system. Prioritize it when user-facing response times exceed targets, when P99 latency is significantly higher than P50, or when expanding to new geographic regions.

Tool Comparison

Tool	Type	Best For	Scale
Chrome DevTools	Open Source	Waterfall breakdown of individual requests showing DNS, TCP, TLS, TTFB, and download phases	Development
WebPageTest	Open Source	Multi-location testing with filmstrip view and connection-level timing from real browsers	Development-Production
Lighthouse	Open Source	Automated performance auditing with actionable optimization suggestions	Development
Catchpoint	Commercial	Synthetic monitoring from 800+ global locations with network-layer telemetry	Enterprise

Debug Checklist

Open Chrome DevTools Network tab and check the Timing breakdown — identify which phase (DNS, TCP, TLS, TTFB, download) dominates
Run curl -w with timing variables to measure each phase from the command line: dns_lookup, tcp_connect, tls_handshake, starttransfer, total
Check if connections are being reused — look for Connection: keep-alive headers and verify connection pooling in the HTTP client
Test from multiple geographic locations using WebPageTest or Catchpoint to isolate distance-related latency from server-side latency
Verify TLS version with openssl s_client — confirm TLS 1.3 is negotiated, not 1.2

Common Mistakes

Optimizing bandwidth when latency is the bottleneck. A 1KB API response on a 100ms RTT link doesn't benefit from more bandwidth — the handshake overhead dominates
Ignoring DNS resolution time. A cold DNS lookup to an authoritative server can add 50-200ms, and this happens before anything else
Not enabling TLS 1.3. Sticking with TLS 1.2 adds an extra round-trip on every new connection — that's 50-150ms wasted per connection
Measuring latency only from the data center. Real user latency includes last-mile ISP hops, which can add 10-50ms of jitter
Assuming CDN solves everything. CDNs help with static content but dynamic API calls still hit origin servers — latency there is server think-time

Real World Usage

•Google found that an extra 500ms of latency reduced search traffic by 20% — latency directly impacts revenue
•Amazon calculated that every 100ms of added latency costs 1% in sales — this drove their global edge infrastructure investment
•Cloudflare's Anycast network routes users to the nearest PoP, reducing RTT to under 20ms for most of the world's population
•High-frequency trading firms pay millions for colocation and microwave links to shave microseconds off latency
•Mobile networks add 50-300ms of latency from radio access network overhead, which is why mobile-first optimization matters

RFCs & Specs

RFC 6349 — Framework for TCP Throughput TestingRFC 8446 — TLS 1.3 (reduced handshake latency)RFC 7413 — TCP Fast OpenRFC 9000 — QUIC (0-RTT connection resumption)

Network Latency — Where Time Goes

TCPTLSHTTPDNS

Every millisecond in a request comes from somewhere — DNS, TCP, TLS, server processing, or transfer — and each one can be measured and optimized.

The Problem

Why does a simple API call take 300ms when the server processes it in 5ms? Where do the other 295ms go, and how does an engineering team systematically eliminate them?

Mental Model

Like a relay race — the total time is the sum of every handoff. The runners can't go faster (speed of light), but the number of handoffs can be reduced.

Architecture Diagram

How It Works

The Anatomy of an HTTPS Request

A cold HTTPS request — no cached DNS, no existing connection — goes through five distinct phases:

1. DNS Resolution (20-120ms)

2. TCP Handshake (1 RTT)

The three-way handshake: SYN → SYN-ACK → ACK. This costs exactly one round-trip time. New York to London is ~75ms RTT, so that's 75ms before a single byte of application data.

3. TLS Handshake (1-2 RTTs)

4. Time to First Byte — TTFB (variable)

5. Content Transfer (size / bandwidth)

The response body downloads. For a 50KB JSON API response on a broadband connection, this is negligible. For a 5MB image on a 3G mobile network, this dominates.

RTT by Geography: The Speed of Light Tax

Light in fiber travels at roughly 200,000 km/s — about two-thirds of vacuum speed due to the refractive index of glass. This sets a hard physical floor that no amount of engineering can break.

Route	Distance	Theoretical Min RTT	Typical RTT
US East ↔ US West	~4,000 km	~40ms	60-80ms
US East ↔ Europe (London)	~5,500 km	~55ms	70-90ms
US East ↔ Asia (Tokyo)	~11,000 km	~110ms	150-200ms
US West ↔ Asia (Tokyo)	~8,500 km	~85ms	100-140ms
Europe ↔ Asia (Singapore)	~10,000 km	~100ms	150-250ms
Same region (same AZ)	<100 km	<1ms	0.5-2ms

Bandwidth vs. Latency: The Pipe Analogy

These two concepts are fundamentally different, and confusing them is one of the most common mistakes in network optimization.

Bandwidth is the width of the pipe — how much data can be pushed through per second. Going from 100 Mbps to 1 Gbps enables transferring large files 10x faster.

Latency is the length of the pipe — how long it takes for a single bit to travel from source to destination. No amount of bandwidth reduces latency.

# Measure each phase of an HTTPS request
curl -w "\
  DNS:        %{time_namelookup}s\n\
  TCP:        %{time_connect}s\n\
  TLS:        %{time_appconnect}s\n\
  TTFB:       %{time_starttransfer}s\n\
  Total:      %{time_total}s\n\
  Download:   %{size_download} bytes\n" \
  -o /dev/null -s https://api.example.com/data

Optimization Techniques

Reducing latency falls into three categories: reduce round-trips, reduce distance, and reduce processing time.

Reduce Round-Trips

<!-- DNS prefetch: resolve DNS for domains needed soon -->
<link rel="dns-prefetch" href="//api.example.com">

<!-- Preconnect: establish TCP + TLS ahead of time -->
<link rel="preconnect" href="https://cdn.example.com">

Reduce Distance

Edge compute (Cloudflare Workers, Lambda@Edge) runs application code at the edge, not just static caching. An A/B test decision that runs at the edge saves a round-trip to origin.

Reduce Processing Time

# Nginx Brotli compression
brotli on;
brotli_types text/html text/css application/javascript application/json;
brotli_comp_level 6;

Reduce Payload Size

Less data means less transfer time. This sounds obvious, but it's routinely ignored:

Return only the fields the client needs (GraphQL shines here)
Paginate large responses instead of dumping everything
Use efficient serialization (Protocol Buffers vs JSON can be 3-10x smaller)
Optimize images: WebP/AVIF instead of PNG/JPEG, proper sizing, lazy loading

Measuring Latency in Production

Lab measurements (Lighthouse, WebPageTest) reveal what's possible. Real User Monitoring (RUM) reveals what's actually happening.

Key Metrics to Track

Metric	What It Reveals	Target
P50 Latency	Typical user experience	<200ms for APIs
P95 Latency	Experience for 1-in-20 users	<500ms for APIs
P99 Latency	Worst common case	<1s for APIs
TTFB	Server processing + network to first byte	<100ms at edge
DNS Time	Resolution overhead	<20ms (cached)

Synthetic vs Real User Monitoring

Why This Matters

Key Points

•A cold HTTPS request from New York to London costs ~250ms minimum before a single byte of content arrives: DNS + TCP + TLS + TTFB
•Bandwidth and latency are fundamentally different — a 10 Gbps pipe doesn't help if RTT is 150ms. Latency is about distance; bandwidth is about width
•The speed of light in fiber is ~200,000 km/s (roughly 2/3 of vacuum speed), setting a hard physical floor on latency
•TLS 1.3 reduced the handshake from 2 RTTs to 1 RTT (and 0-RTT for resumption), which is why upgrading from TLS 1.2 matters
•Connection reuse (HTTP keep-alive, connection pooling) is the single most impactful latency optimization because it eliminates handshake costs entirely

Key Components

Component	Role
DNS Resolution	Translates hostname to IP address — often 20-120ms, cached aggressively by browsers and OS resolvers
TCP Handshake	Three-way handshake (SYN, SYN-ACK, ACK) that costs exactly one round-trip before data can flow
TLS Handshake	Negotiates cipher suite and exchanges keys — adds 1-2 RTTs depending on TLS version (1.2 vs 1.3)
Time to First Byte (TTFB)	Server processing time from receiving the request to sending the first byte of the response
Content Transfer	Time to download the full response, governed by bandwidth, TCP congestion window, and response size

When to Use

Tool Comparison

Tool	Type	Best For	Scale
Chrome DevTools	Open Source	Waterfall breakdown of individual requests showing DNS, TCP, TLS, TTFB, and download phases	Development
WebPageTest	Open Source	Multi-location testing with filmstrip view and connection-level timing from real browsers	Development-Production
Lighthouse	Open Source	Automated performance auditing with actionable optimization suggestions	Development
Catchpoint	Commercial	Synthetic monitoring from 800+ global locations with network-layer telemetry	Enterprise

Debug Checklist

Open Chrome DevTools Network tab and check the Timing breakdown — identify which phase (DNS, TCP, TLS, TTFB, download) dominates
Run curl -w with timing variables to measure each phase from the command line: dns_lookup, tcp_connect, tls_handshake, starttransfer, total
Check if connections are being reused — look for Connection: keep-alive headers and verify connection pooling in the HTTP client
Test from multiple geographic locations using WebPageTest or Catchpoint to isolate distance-related latency from server-side latency
Verify TLS version with openssl s_client — confirm TLS 1.3 is negotiated, not 1.2

Common Mistakes

Optimizing bandwidth when latency is the bottleneck. A 1KB API response on a 100ms RTT link doesn't benefit from more bandwidth — the handshake overhead dominates
Ignoring DNS resolution time. A cold DNS lookup to an authoritative server can add 50-200ms, and this happens before anything else
Not enabling TLS 1.3. Sticking with TLS 1.2 adds an extra round-trip on every new connection — that's 50-150ms wasted per connection
Measuring latency only from the data center. Real user latency includes last-mile ISP hops, which can add 10-50ms of jitter
Assuming CDN solves everything. CDNs help with static content but dynamic API calls still hit origin servers — latency there is server think-time

Real World Usage

•Google found that an extra 500ms of latency reduced search traffic by 20% — latency directly impacts revenue
•Amazon calculated that every 100ms of added latency costs 1% in sales — this drove their global edge infrastructure investment
•Cloudflare's Anycast network routes users to the nearest PoP, reducing RTT to under 20ms for most of the world's population
•High-frequency trading firms pay millions for colocation and microwave links to shave microseconds off latency
•Mobile networks add 50-300ms of latency from radio access network overhead, which is why mobile-first optimization matters

RFCs & Specs

RFC 6349 — Framework for TCP Throughput TestingRFC 8446 — TLS 1.3 (reduced handshake latency)RFC 7413 — TCP Fast OpenRFC 9000 — QUIC (0-RTT connection resumption)

The Problem

Mental Model

Architecture Diagram

How It Works

The Anatomy of an HTTPS Request

RTT by Geography: The Speed of Light Tax

Bandwidth vs. Latency: The Pipe Analogy

Optimization Techniques

Reduce Round-Trips

Reduce Distance

Reduce Processing Time

Reduce Payload Size

Measuring Latency in Production

Key Metrics to Track

Synthetic vs Real User Monitoring

Why This Matters

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics

The Problem

Mental Model

Architecture Diagram

How It Works

The Anatomy of an HTTPS Request

RTT by Geography: The Speed of Light Tax

Bandwidth vs. Latency: The Pipe Analogy

Optimization Techniques

Reduce Round-Trips

Reduce Distance

Reduce Processing Time

Reduce Payload Size

Measuring Latency in Production

Key Metrics to Track

Synthetic vs Real User Monitoring

Why This Matters

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics