TCP vs UDP Decision Framework
TCP guarantees ordered delivery with congestion control, UDP provides raw speed with no guarantees, and QUIC offers TCP-like reliability with independent streams and built-in encryption over UDP.
The Problem
Picking the wrong transport protocol means either paying unnecessary overhead (TCP for real-time video) or building reliability from scratch (UDP for file transfer).
Mental Model
TCP is a phone call — connection setup, guaranteed delivery, ordered conversation. UDP is shouting across a room — fast, no setup, but messages might get lost or arrive out of order. QUIC is a walkie-talkie with multiple channels — fast setup, independent streams, built-in encryption.
Architecture Diagram
How It Works
The transport layer decision is the most consequential networking choice in any system design. Pick TCP for a real-time game and players experience rubber-banding from retransmission delays. Pick UDP for a financial trading API and missing a single message means an incorrect position. The right answer depends on understanding what each protocol actually does — and more importantly, what it does not.
TCP — The Reliable Stream
TCP (RFC 793) provides a bidirectional, ordered, reliable byte stream between two endpoints. The key properties:
Connection-oriented. A three-way handshake (SYN, SYN-ACK, ACK) establishes state on both ends before data flows. This adds one round trip of latency (TCP Fast Open can reduce this for returning clients).
Ordered delivery. TCP assigns a sequence number to every byte. The receiver reassembles bytes in order, regardless of the order packets arrive. If packet 3 arrives before packet 2, the receiver buffers packet 3 and waits.
Reliable delivery. Every data segment is acknowledged. If the sender does not receive an ACK within the retransmission timeout, it resends. Data is guaranteed to arrive or the connection is terminated.
Flow control. The receiver advertises a window size — the amount of data it can buffer. The sender never exceeds this, preventing the receiver from being overwhelmed.
Congestion control. Algorithms like Cubic and BBR adjust the sending rate based on network conditions. Packet loss or increased latency causes TCP to back off, preventing network congestion collapse.
# Watch TCP connection states
ss -tnp | head -20
# Monitor retransmissions (sign of network issues)
netstat -s | grep -i retransmit
# Measure TCP handshake time
curl -w "TCP handshake: %{time_connect}s\n" -o /dev/null -s https://example.com
UDP — The Raw Datagram
UDP (RFC 768) is the anti-TCP. It provides almost nothing: an 8-byte header with source port, destination port, length, and checksum. No connection. No ordering. No reliability. No flow control. No congestion control.
Each UDP datagram is independent. The sender fires and forgets. If the network drops a packet, neither side knows unless the application implements its own detection. If packets arrive out of order, the application sees them out of order.
This minimalism is a feature, not a limitation. UDP's lack of guarantees means:
- No head-of-line blocking. A lost packet does not stall subsequent packets.
- No connection setup latency. Data can flow immediately.
- No retransmission delay. The application decides whether to retry, skip, or send the latest version instead.
- Minimal overhead. 8-byte header vs TCP's 20+ bytes. No connection state in the kernel.
# Check UDP socket stats
ss -unp
# Monitor UDP errors
netstat -su
# Look for "packet receive errors" and "packets to unknown port received"
QUIC — The Modern Hybrid
QUIC (RFC 9000) runs over UDP but provides TCP-like reliability with critical improvements:
1-RTT handshake with built-in TLS 1.3. TCP requires one RTT for the handshake plus one more for TLS negotiation. QUIC combines both into a single round trip. Returning clients get 0-RTT.
Independent streams. QUIC multiplexes multiple streams over a single connection. A lost packet on stream 1 does not block streams 2, 3, or 4. This is the single biggest advantage over TCP, which treats the entire connection as one ordered byte stream.
Connection migration. QUIC connections are identified by a connection ID, not the IP:port tuple. When a mobile device switches from WiFi to cellular, the QUIC connection survives. TCP connections die and must be re-established.
Userspace implementation. QUIC runs in application space, not the kernel. This enables faster iteration and deployment but costs more CPU than kernel-optimized TCP.
The Decision Matrix
This is the practical framework. For each workload property, the table shows which protocol fits:
| Property | TCP | UDP | QUIC |
|---|---|---|---|
| Every byte must arrive | Native | Application must implement | Native |
| Order matters | Native (but causes HOL blocking) | Application must implement | Per-stream ordering (no HOL blocking) |
| Low latency critical | 1-RTT setup + retransmission delays | Zero setup, no retransmission | 1-RTT setup (0-RTT for returning clients) |
| Multiple independent requests | One stream per connection (or HTTP/2 with HOL risk) | Natural (each datagram independent) | Native multiplexed streams |
| Mobile / network switching | Connection dies on IP change | No connection to lose | Connection ID survives IP change |
| Congestion fairness | Built-in (Cubic, BBR) | None — application must self-regulate | Built-in |
| Encrypted by default | No (requires TLS on top) | No | Yes (TLS 1.3 mandatory) |
| Kernel optimization | Decades of optimization, hardware offload | Minimal kernel overhead | Userspace — higher CPU cost |
When to Use What
TCP: Web servers, REST APIs, database connections, file transfers, SSH, email. Any workload where correctness matters more than latency, and a single ordered stream is sufficient.
UDP: DNS queries, DHCP, real-time video/audio (via RTP), live game state updates, IoT telemetry. Any workload where latest-value-wins, dropped data is acceptable, or the application implements its own reliability.
QUIC: HTTP/3 web traffic, mobile applications, any workload that suffers from TCP's head-of-line blocking or needs connection migration. The sweet spot is high-latency, lossy networks (mobile, satellite) where TCP's retransmission behavior is painful.
SCTP: Telephony signaling (SIGTRAN), WebRTC data channels. Provides message boundaries (unlike TCP's byte stream), multi-homing (failover between network paths), and multi-streaming. Rarely used outside telecom because middleboxes (NAT, firewalls) often do not support it.
Building Reliability on Top of UDP
Many real-world systems use UDP as the transport but build selective reliability on top. This is not reinventing TCP — it is building exactly the guarantees needed, no more:
Game netcode: Player positions are sent unreliably (latest value always wins — an outdated position is useless). Game events (damage, item pickup, player death) are sent reliably with acknowledgments and retransmission. This split is impossible with TCP, which makes everything reliable.
Video streaming (RTP/SRTP): Video frames are sent over UDP via RTP. Lost frames cause brief visual artifacts but do not stall playback. Audio is more sensitive — codecs like Opus include forward error correction to reconstruct lost packets without retransmission.
QUIC itself: QUIC is literally reliability built on UDP. It implements congestion control, retransmission, and ordering — but per-stream, not per-connection.
TCP: [Packet 1] [Packet 2 LOST] [Packet 3 waits] [Packet 2 retransmit] [Packet 3 delivered]
↑ head-of-line blocking
QUIC: Stream A: [Pkt 1] [Pkt 2 LOST] [Pkt 2 retransmit → delivered]
Stream B: [Pkt 1] [Pkt 2] [Pkt 3] ← not blocked by Stream A's loss
Performance Comparison
Real-world measurements, not theory:
| Metric | TCP | UDP | QUIC |
|---|---|---|---|
| Connection setup | 1 RTT (+ 1 for TLS) | 0 RTT | 1 RTT (0-RTT for resumption) |
| Throughput (bulk) | Excellent (kernel offload) | Excellent (no overhead) | Good (userspace overhead) |
| Latency (0% loss) | Low | Lowest | Low |
| Latency (2% loss) | High (HOL blocking + retransmit) | Low (app skips lost data) | Medium (per-stream retransmit) |
| CPU cost | Low (kernel) | Lowest | Higher (userspace crypto + transport) |
| Memory per connection | ~1-4 KB kernel state | None | ~1-4 KB userspace state |
The takeaway: on clean networks, TCP is hard to beat. On lossy networks with concurrent streams, QUIC wins. For fire-and-forget datagrams, nothing beats raw UDP. The decision is about the workload, not the protocol's reputation.
Key Points
- •TCP guarantees ordered, reliable delivery at the cost of head-of-line blocking and connection setup latency — the right choice when every byte must arrive in order
- •UDP provides minimal overhead and no head-of-line blocking but shifts reliability entirely to the application layer — the right choice when speed matters more than completeness
- •QUIC combines the reliability of TCP with UDP's lack of head-of-line blocking by running independent streams over UDP with built-in TLS 1.3
- •The real decision is not 'reliable vs fast' — it is about which guarantees the application actually needs and which it can handle itself
- •DNS uses UDP because queries fit in a single packet and retrying is cheaper than maintaining a connection — but DNS-over-HTTPS uses TCP because it runs over HTTP/2
Key Components
| Component | Role |
|---|---|
| Connection State (TCP) | TCP maintains per-connection state — sequence numbers, acknowledgments, congestion window, retransmission timers — requiring a three-way handshake before data flows |
| Datagram (UDP) | UDP is stateless and connectionless — each packet is independent, with no setup, no ordering, no retransmission, and minimal 8-byte header overhead |
| Stream Multiplexing (QUIC) | QUIC provides multiple independent streams over a single connection — a lost packet on one stream does not block others, solving TCP's head-of-line blocking |
| Flow Control | TCP uses a sliding window to prevent the sender from overwhelming the receiver — UDP has none, putting the burden on the application |
| Congestion Control | TCP adjusts sending rate based on network conditions (Cubic, BBR) — UDP sends at whatever rate the application chooses, risking network collapse without app-level controls |
When to Use
Use TCP for request-response workloads where every byte matters (web, APIs, databases). Use UDP for real-time media, DNS, and protocols that implement their own reliability. Use QUIC when head-of-line blocking, connection migration, or built-in encryption are priorities.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| TCP (kernel) | Open Source | Web traffic, APIs, database connections, file transfer — any workload that needs guaranteed, ordered delivery with kernel-optimized performance | Universal |
| QUIC (userspace) | Open Source | Web browsing (HTTP/3), mobile apps, and any workload suffering from TCP head-of-line blocking or frequent connection migration | Growing (40%+ of web traffic) |
| KCP | Open Source | Low-latency reliable transport over UDP — popular in game networking and VPN tunnels where TCP retransmission is too slow | Niche |
| ENet | Open Source | Game networking library providing reliable, unreliable, and sequenced channels over UDP with built-in fragmentation | Game development |
Debug Checklist
- Check connection state: ss -tnp (TCP) or ss -unp (UDP) to see active connections and their state
- Measure retransmissions: netstat -s | grep retransmit — high retransmit rates indicate network issues amplified by TCP
- Monitor UDP packet loss: compare sent vs received counters with netstat -su — UDP has no built-in loss detection
- Test QUIC connectivity: curl --http3 https://example.com — falls back to HTTP/2 over TCP if QUIC fails
- Profile latency: tcpdump + wireshark to measure handshake time (TCP 3-way vs QUIC 1-RTT) and per-packet round trip times
Common Mistakes
- Choosing TCP for real-time media because 'reliability is always better.' Retransmitting a dropped video frame that arrives after the playback deadline is worse than skipping it.
- Choosing UDP for bulk data transfer to 'go faster.' Without congestion control, UDP floods the network and causes massive packet loss for everyone.
- Assuming QUIC is always better than TCP. QUIC runs in userspace, consuming more CPU than kernel-optimized TCP — for simple request-response workloads, TCP is often faster.
- Not implementing any reliability on top of UDP. Games, VoIP, and video all need some form of selective acknowledgment and retransmission — raw UDP is rarely used directly.
- Ignoring SCTP, which provides message boundaries and multi-homing natively. It is the right choice for telephony signaling (SIGTRAN) and WebRTC data channels.
Real World Usage
- •HTTP/3 runs over QUIC, which itself runs over UDP — Google, Cloudflare, and Meta serve billions of requests per day on this stack
- •Online multiplayer games use UDP with custom reliability layers, sending player positions unreliably (latest wins) and game events reliably (hit registration)
- •DNS uses UDP for standard queries (single packet) and falls back to TCP for zone transfers and responses exceeding 512 bytes
- •Video streaming (Zoom, Teams, Meet) uses UDP via RTP/SRTP, accepting some packet loss in exchange for low latency — a retransmitted frame arriving late is useless