Performance & ObservabilityTopic 4 of 6

Performance & ObservabilityIntermediate

TCP/IP Debugging Toolkit

TCPUDPIPDNSTLSHTTP

Systematic network debugging starts with the symptom, picks the right tool (ss, tcpdump, mtr, dig, curl, openssl), and works from connectivity through transport to application layer.

The Problem

When a network connection fails, is slow, or behaves unexpectedly — how does an engineer systematically isolate whether the problem is DNS, routing, TCP, TLS, or the application?

Mental Model

Like a mechanic's toolbox — each tool reveals a different layer of what's happening under the hood. Don't pull out the oscilloscope first; start with the symptom and pick the right diagnostic.

Architecture Diagram

How It Works

Network debugging is a skill that separates senior engineers from everyone else. It's not about memorizing flags — it's about having a systematic approach: start with the symptom, pick the right tool, and work through layers until the root cause surfaces.

The Debugging Decision Tree

Before reaching for any tool, classify the symptom:

Symptom	Likely Layer	First Tool
"Connection refused"	Transport (TCP)	`ss -tlnp`
"Connection timed out"	Network/Firewall	`mtr`, `nc -zv`
"Slow responses"	Transport/App	`ss -ti`, `curl -w`
"TLS handshake failed"	Security	`openssl s_client`
"DNS resolution failed"	Application/DNS	`dig +trace`
"HTTP 502/503 errors"	Application	`curl -v`
"Intermittent failures"	Any layer	`tcpdump` + Wireshark

The rule is: start at the highest layer that makes sense and work down. Don't capture packets for an HTTP 404 — that's an application problem. Do capture packets when connections randomly reset and nobody knows why.

Tool 1: ss — Socket Statistics

ss is the modern replacement for netstat. It's faster (reads directly from kernel netlink) and shows more TCP internal state.

# List all listening TCP ports with process names
ss -tlnp

# Show all established connections to a specific host
ss -tnp dst 10.0.1.50

# Show TCP internal metrics for connections (the gold)
ss -ti dst 10.0.1.50
# Output includes:
#   rtt:1.234/0.567      → smoothed RTT / variance
#   retrans:0/3           → current / total retransmissions
#   cwnd:10               → congestion window (in MSS units)
#   send 93.4Mbps         → estimated send rate
#   rcv_space:65536       → receive window

# Count connections by state
ss -tn state established | wc -l
ss -tn state time-wait | wc -l
ss -tn state close-wait | wc -l

What to look for:

Thousands of TIME_WAIT: ephemeral port exhaustion risk. Enable tcp_tw_reuse.
Growing CLOSE_WAIT: the application isn't closing connections. This is a code bug, not a network issue.
SYN_SENT stuck connections: the remote server isn't responding to SYN (firewall, server down).
High retrans count in ss -ti: packet loss on this connection.

Tool 2: tcpdump — Packet Capture

The universal source of truth. When everything else is ambiguous, packets tell the real story.

# Capture HTTP traffic on port 443 to a file
tcpdump -i any port 443 -w /tmp/capture.pcap -c 5000

# Capture traffic to a specific host
tcpdump -i eth0 host 10.0.1.50 -w /tmp/debug.pcap

# Capture only SYN packets (connection attempts)
tcpdump -i any 'tcp[tcpflags] & (tcp-syn) != 0' -n

# Capture DNS queries
tcpdump -i any port 53 -n

# Live display with readable output (no file)
tcpdump -i any port 8080 -n -A  # -A for ASCII payload

# Capture with rotation (10 files of 100MB each)
tcpdump -i any port 443 -w /tmp/cap.pcap -C 100 -W 10

Key tcpdump flags:

-i any: capture on all interfaces
-n: don't resolve hostnames (faster)
-w file.pcap: write raw packets (for Wireshark analysis)
-c N: stop after N packets
-s 0: capture full packets (not just headers)

After capturing, open the .pcap in Wireshark for analysis. Use these Wireshark filters:

tcp.analysis.retransmission          # Find retransmitted packets
tcp.analysis.zero_window             # Flow control issues
tcp.flags.reset == 1                 # Connection resets
tcp.analysis.duplicate_ack           # Signs of packet loss
ssl.alert_message                    # TLS errors
dns.flags.rcode != 0                # DNS failures

Tool 3: mtr — Path Analysis

mtr combines traceroute and ping into continuous monitoring. It sends packets and reports per-hop statistics.

# Basic mtr to a host (runs continuously, Ctrl+C to stop)
mtr -n 10.0.1.50

# Report mode (100 packets, then exit with summary)
mtr -n --report -c 100 api.example.com

# Use TCP instead of ICMP (bypasses ICMP-blocking firewalls)
mtr -n --tcp --port 443 api.example.com

# Use UDP
mtr -n --udp api.example.com

Reading mtr output:

Host                   Loss%  Snt   Last   Avg  Best  Wrst  StDev
1. 10.0.0.1             0.0%  100    0.5   0.5   0.4   1.2   0.1
2. 172.16.0.1           0.0%  100    1.2   1.3   1.0   3.5   0.3
3. 203.0.113.1         12.0%  100    5.4  25.3   4.8  150.2  35.1  ← Problem hop
4. 198.51.100.1         0.0%  100   15.2  15.5  14.8   18.3   0.5
5. api.example.com      0.0%  100   15.8  16.0  15.2   19.1   0.6

Important: Loss at an intermediate hop but not at the destination usually means that router rate-limits ICMP (traceroute probes) — this is normal and not a real problem. Only worry when loss appears at intermediate hops AND the destination.

Tool 4: dig — DNS Debugging

# Basic query
dig api.example.com

# Query specific record type
dig api.example.com AAAA    # IPv6
dig example.com MX          # Mail servers
dig example.com TXT         # TXT records (SPF, DKIM)

# Trace the full delegation chain (root → TLD → authoritative)
dig +trace api.example.com

# Query a specific DNS server
dig @8.8.8.8 api.example.com

# Check TTL (how long until cache expires)
dig api.example.com | grep -A1 "ANSWER SECTION"

# Reverse DNS lookup
dig -x 93.184.216.34

What to look for:

NXDOMAIN: the domain doesn't exist (typo? deleted record?)
SERVFAIL: the nameserver can't answer (DNSSEC validation failure? broken delegation?)
High TTL: changes won't propagate until existing caches expire
Different answers from different nameservers: propagation delay or inconsistent configuration

Tool 5: curl — HTTP Debugging

# Verbose output showing headers and TLS negotiation
curl -v https://api.example.com/health

# Timing breakdown of every phase
curl -w "\
  DNS:        %{time_namelookup}s\n\
  Connect:    %{time_connect}s\n\
  TLS:        %{time_appconnect}s\n\
  TTFB:       %{time_starttransfer}s\n\
  Total:      %{time_total}s\n\
  HTTP Code:  %{http_code}\n\
  Size:       %{size_download} bytes\n" \
  -o /dev/null -s https://api.example.com/data

# Follow redirects and show each hop
curl -vL https://example.com

# Send with specific headers
curl -H "Authorization: Bearer TOKEN" -H "Accept: application/json" \
  https://api.example.com/resource

# Test POST with body
curl -X POST -d '{"key":"value"}' -H "Content-Type: application/json" \
  https://api.example.com/resource

Tool 6: openssl s_client — TLS Debugging

# Connect and show certificate chain
openssl s_client -connect api.example.com:443 -servername api.example.com

# Check certificate expiry
openssl s_client -connect api.example.com:443 -servername api.example.com 2>/dev/null \
  | openssl x509 -noout -dates

# Show negotiated cipher and TLS version
openssl s_client -connect api.example.com:443 -servername api.example.com 2>/dev/null \
  | grep -E "Protocol|Cipher"

# Test specific TLS version
openssl s_client -connect api.example.com:443 -tls1_3

# Verify certificate against a CA bundle
openssl s_client -connect api.example.com:443 -CAfile /etc/ssl/certs/ca-certificates.crt

Putting It All Together: A Real Debugging Session

Here's how these tools combine in a real incident: "Service A can't reach Service B, getting timeouts."

# Step 1: Can we reach the port at all?
nc -zv service-b.internal:8080
# Result: "Connection timed out" → not a DNS issue, not an app issue

# Step 2: Is it a routing or firewall issue?
mtr -n --tcp --port 8080 service-b.internal
# Result: 100% loss at hop 3 → firewall or routing issue

# Step 3: Check from the other side — is Service B listening?
# (SSH to service-b host)
ss -tlnp | grep :8080
# Result: "LISTEN 0 128 0.0.0.0:8080" → service is running

# Step 4: Check firewall rules
iptables -L -n | grep 8080
# Result: No allow rule → firewall is blocking traffic

# Fix: Add firewall rule, verify with nc, confirm with curl

Common Patterns and Their Diagnoses

Pattern	Diagnosis
SYN sent, no SYN-ACK received	Firewall dropping packets, server down, or wrong IP
SYN-ACK received, then RST	Port is open but service rejected the connection (TCP wrapper, listen backlog full)
Connection established, then RST	Application-level rejection (protocol mismatch, auth failure)
Established but no data flows	Application deadlock, full send/receive buffer, blocked thread
Increasing retransmissions	Network congestion or packet loss on the path
Zero window events	Receiver can't keep up — application is slow to read from socket

These patterns are visible in ss -ti output and tcpdump captures. Learning to recognize them turns hours of guessing into minutes of targeted diagnosis.

Why This Matters

Every backend engineer will face network issues in production. The engineers who can pick the right tool, capture the right evidence, and isolate the root cause in minutes are worth their weight in gold. These tools are free, available on every Linux system, and repay the investment in learning them across an entire career.

Key Points

•The best debugging approach is symptom-driven: start with what's broken (timeout, refused, slow, TLS error) and pick the right tool for that symptom
•tcpdump is the universal truth — when logs and metrics disagree, packets don't lie. Learn to capture and filter effectively
•ss -ti exposes TCP internals (RTT, cwnd, retransmits) per connection without packet capture — it's the fastest way to spot TCP issues
•mtr combines traceroute and ping into a continuous path analysis — it reveals which hop is dropping packets or adding latency
•Most 'network issues' are actually application issues. Always check the application layer (curl -v, HTTP status codes) before diving into packets

Key Components

Component	Role
tcpdump / Wireshark	Packet capture and analysis — tcpdump for command-line capture on servers, Wireshark for visual deep-dive analysis
ss / netstat	Socket statistics showing connection states, window sizes, RTT, and retransmission counts per connection
mtr / traceroute	Path analysis showing every hop between source and destination, with per-hop latency and packet loss
dig / nslookup	DNS resolution debugging — query specific record types, trace delegation chain, verify TTL and propagation
curl -v / openssl s_client	HTTP and TLS debugging — verbose request/response headers, certificate chain verification, cipher negotiation

When to Use

Reach for these tools whenever application-level logs and metrics don't explain the problem. Connection timeouts, intermittent failures, unexplained latency, and TLS errors all require network-level debugging.

Tool Comparison

Tool	Type	Best For	Scale
Wireshark	Open Source	Deep packet inspection with GUI — TCP stream reassembly, retransmission analysis, protocol dissection	Development-Production
tcpdump	Open Source	Command-line packet capture on remote servers — lightweight, available everywhere, scriptable	Any
mtr	Open Source	Continuous network path analysis combining traceroute and ping — shows per-hop loss and jitter	Any
netcat (nc)	Open Source	Quick connectivity tests — TCP/UDP port checks, simple client-server testing, banner grabbing	Any

Debug Checklist

Check if the port is open and the service is listening: ss -tlnp | grep :PORT — if nothing shows, the service isn't bound or isn't running
Test basic connectivity: nc -zv HOST PORT — this confirms whether the TCP handshake succeeds without any application protocol
Trace the network path: mtr -n --report HOST — look for hops with >1% packet loss or sudden latency jumps
Capture packets for detailed analysis: tcpdump -i any port PORT -w /tmp/capture.pcap -c 1000 — then open in Wireshark
Check TCP connection health: ss -ti dst HOST — look at rtt, retrans count, cwnd size, and whether the connection is in a healthy state
Verify DNS resolution: dig +trace HOSTNAME — follow the delegation chain from root servers to authoritative, checking for delays
Debug TLS issues: openssl s_client -connect HOST:443 -servername HOST — verify certificate validity, chain, and negotiated cipher
Debug HTTP layer: curl -v -o /dev/null https://HOST/path — inspect request/response headers, redirects, timing, and status codes

Common Mistakes

Capturing too many packets without filters. Always use tcpdump with port and host filters — an unfiltered capture on a busy server fills disk in seconds
Running traceroute once and drawing conclusions. Network paths fluctuate — use mtr with 100+ packets to get statistically meaningful results
Confusing ICMP-based traceroute results with actual TCP path behavior. Some routers rate-limit ICMP, showing false packet loss
Not checking both sides of the connection. A timeout might be the client not sending, the server not responding, or a middlebox dropping packets
Forgetting about firewalls and security groups. 'Connection refused' vs 'connection timed out' indicates whether a firewall is dropping (timeout) or rejecting (refused)

Real World Usage

•SRE teams use tcpdump to capture packets during incidents, then analyze offline in Wireshark to find retransmissions, resets, and connection failures
•Network engineers use mtr to diagnose path-specific issues when users in a specific region report slowness
•DevOps engineers use ss -tnp to find connection state accumulation (thousands of TIME_WAIT or CLOSE_WAIT sockets) during high-traffic events
•Security teams use tcpdump to verify that traffic between services is actually encrypted (TLS) and not leaking plaintext
•Platform teams use dig +trace to debug DNS propagation delays after zone changes or during DNS migration

RFCs & Specs

RFC 792 — ICMP (used by traceroute and ping)RFC 1035 — DNS Implementation and Specification (dig queries)RFC 8446 — TLS 1.3 (openssl debugging)RFC 9293 — TCP (core protocol being debugged)

TCP/IP Debugging Toolkit

TCPUDPIPDNSTLSHTTP

Systematic network debugging starts with the symptom, picks the right tool (ss, tcpdump, mtr, dig, curl, openssl), and works from connectivity through transport to application layer.

The Problem

When a network connection fails, is slow, or behaves unexpectedly — how does an engineer systematically isolate whether the problem is DNS, routing, TCP, TLS, or the application?

Mental Model

Like a mechanic's toolbox — each tool reveals a different layer of what's happening under the hood. Don't pull out the oscilloscope first; start with the symptom and pick the right diagnostic.

Architecture Diagram

How It Works

The Debugging Decision Tree

Before reaching for any tool, classify the symptom:

Symptom	Likely Layer	First Tool
"Connection refused"	Transport (TCP)	`ss -tlnp`
"Connection timed out"	Network/Firewall	`mtr`, `nc -zv`
"Slow responses"	Transport/App	`ss -ti`, `curl -w`
"TLS handshake failed"	Security	`openssl s_client`
"DNS resolution failed"	Application/DNS	`dig +trace`
"HTTP 502/503 errors"	Application	`curl -v`
"Intermittent failures"	Any layer	`tcpdump` + Wireshark

Tool 1: ss — Socket Statistics

ss is the modern replacement for netstat. It's faster (reads directly from kernel netlink) and shows more TCP internal state.

# List all listening TCP ports with process names
ss -tlnp

# Show all established connections to a specific host
ss -tnp dst 10.0.1.50

# Show TCP internal metrics for connections (the gold)
ss -ti dst 10.0.1.50
# Output includes:
#   rtt:1.234/0.567      → smoothed RTT / variance
#   retrans:0/3           → current / total retransmissions
#   cwnd:10               → congestion window (in MSS units)
#   send 93.4Mbps         → estimated send rate
#   rcv_space:65536       → receive window

# Count connections by state
ss -tn state established | wc -l
ss -tn state time-wait | wc -l
ss -tn state close-wait | wc -l

What to look for:

Thousands of TIME_WAIT: ephemeral port exhaustion risk. Enable tcp_tw_reuse.
Growing CLOSE_WAIT: the application isn't closing connections. This is a code bug, not a network issue.
SYN_SENT stuck connections: the remote server isn't responding to SYN (firewall, server down).
High retrans count in ss -ti: packet loss on this connection.

Tool 2: tcpdump — Packet Capture

The universal source of truth. When everything else is ambiguous, packets tell the real story.

# Capture HTTP traffic on port 443 to a file
tcpdump -i any port 443 -w /tmp/capture.pcap -c 5000

# Capture traffic to a specific host
tcpdump -i eth0 host 10.0.1.50 -w /tmp/debug.pcap

# Capture only SYN packets (connection attempts)
tcpdump -i any 'tcp[tcpflags] & (tcp-syn) != 0' -n

# Capture DNS queries
tcpdump -i any port 53 -n

# Live display with readable output (no file)
tcpdump -i any port 8080 -n -A  # -A for ASCII payload

# Capture with rotation (10 files of 100MB each)
tcpdump -i any port 443 -w /tmp/cap.pcap -C 100 -W 10

Key tcpdump flags:

-i any: capture on all interfaces
-n: don't resolve hostnames (faster)
-w file.pcap: write raw packets (for Wireshark analysis)
-c N: stop after N packets
-s 0: capture full packets (not just headers)

After capturing, open the .pcap in Wireshark for analysis. Use these Wireshark filters:

tcp.analysis.retransmission          # Find retransmitted packets
tcp.analysis.zero_window             # Flow control issues
tcp.flags.reset == 1                 # Connection resets
tcp.analysis.duplicate_ack           # Signs of packet loss
ssl.alert_message                    # TLS errors
dns.flags.rcode != 0                # DNS failures

Tool 3: mtr — Path Analysis

mtr combines traceroute and ping into continuous monitoring. It sends packets and reports per-hop statistics.

# Basic mtr to a host (runs continuously, Ctrl+C to stop)
mtr -n 10.0.1.50

# Report mode (100 packets, then exit with summary)
mtr -n --report -c 100 api.example.com

# Use TCP instead of ICMP (bypasses ICMP-blocking firewalls)
mtr -n --tcp --port 443 api.example.com

# Use UDP
mtr -n --udp api.example.com

Reading mtr output:

Host                   Loss%  Snt   Last   Avg  Best  Wrst  StDev
1. 10.0.0.1             0.0%  100    0.5   0.5   0.4   1.2   0.1
2. 172.16.0.1           0.0%  100    1.2   1.3   1.0   3.5   0.3
3. 203.0.113.1         12.0%  100    5.4  25.3   4.8  150.2  35.1  ← Problem hop
4. 198.51.100.1         0.0%  100   15.2  15.5  14.8   18.3   0.5
5. api.example.com      0.0%  100   15.8  16.0  15.2   19.1   0.6

Tool 4: dig — DNS Debugging

# Basic query
dig api.example.com

# Query specific record type
dig api.example.com AAAA    # IPv6
dig example.com MX          # Mail servers
dig example.com TXT         # TXT records (SPF, DKIM)

# Trace the full delegation chain (root → TLD → authoritative)
dig +trace api.example.com

# Query a specific DNS server
dig @8.8.8.8 api.example.com

# Check TTL (how long until cache expires)
dig api.example.com | grep -A1 "ANSWER SECTION"

# Reverse DNS lookup
dig -x 93.184.216.34

What to look for:

NXDOMAIN: the domain doesn't exist (typo? deleted record?)
SERVFAIL: the nameserver can't answer (DNSSEC validation failure? broken delegation?)
High TTL: changes won't propagate until existing caches expire
Different answers from different nameservers: propagation delay or inconsistent configuration

Tool 5: curl — HTTP Debugging

# Verbose output showing headers and TLS negotiation
curl -v https://api.example.com/health

# Timing breakdown of every phase
curl -w "\
  DNS:        %{time_namelookup}s\n\
  Connect:    %{time_connect}s\n\
  TLS:        %{time_appconnect}s\n\
  TTFB:       %{time_starttransfer}s\n\
  Total:      %{time_total}s\n\
  HTTP Code:  %{http_code}\n\
  Size:       %{size_download} bytes\n" \
  -o /dev/null -s https://api.example.com/data

# Follow redirects and show each hop
curl -vL https://example.com

# Send with specific headers
curl -H "Authorization: Bearer TOKEN" -H "Accept: application/json" \
  https://api.example.com/resource

# Test POST with body
curl -X POST -d '{"key":"value"}' -H "Content-Type: application/json" \
  https://api.example.com/resource

Tool 6: openssl s_client — TLS Debugging

# Connect and show certificate chain
openssl s_client -connect api.example.com:443 -servername api.example.com

# Check certificate expiry
openssl s_client -connect api.example.com:443 -servername api.example.com 2>/dev/null \
  | openssl x509 -noout -dates

# Show negotiated cipher and TLS version
openssl s_client -connect api.example.com:443 -servername api.example.com 2>/dev/null \
  | grep -E "Protocol|Cipher"

# Test specific TLS version
openssl s_client -connect api.example.com:443 -tls1_3

# Verify certificate against a CA bundle
openssl s_client -connect api.example.com:443 -CAfile /etc/ssl/certs/ca-certificates.crt

Putting It All Together: A Real Debugging Session

Here's how these tools combine in a real incident: "Service A can't reach Service B, getting timeouts."

# Step 1: Can we reach the port at all?
nc -zv service-b.internal:8080
# Result: "Connection timed out" → not a DNS issue, not an app issue

# Step 2: Is it a routing or firewall issue?
mtr -n --tcp --port 8080 service-b.internal
# Result: 100% loss at hop 3 → firewall or routing issue

# Step 3: Check from the other side — is Service B listening?
# (SSH to service-b host)
ss -tlnp | grep :8080
# Result: "LISTEN 0 128 0.0.0.0:8080" → service is running

# Step 4: Check firewall rules
iptables -L -n | grep 8080
# Result: No allow rule → firewall is blocking traffic

# Fix: Add firewall rule, verify with nc, confirm with curl

Common Patterns and Their Diagnoses

Pattern	Diagnosis
SYN sent, no SYN-ACK received	Firewall dropping packets, server down, or wrong IP
SYN-ACK received, then RST	Port is open but service rejected the connection (TCP wrapper, listen backlog full)
Connection established, then RST	Application-level rejection (protocol mismatch, auth failure)
Established but no data flows	Application deadlock, full send/receive buffer, blocked thread
Increasing retransmissions	Network congestion or packet loss on the path
Zero window events	Receiver can't keep up — application is slow to read from socket

These patterns are visible in ss -ti output and tcpdump captures. Learning to recognize them turns hours of guessing into minutes of targeted diagnosis.

Why This Matters

Key Points

•The best debugging approach is symptom-driven: start with what's broken (timeout, refused, slow, TLS error) and pick the right tool for that symptom
•tcpdump is the universal truth — when logs and metrics disagree, packets don't lie. Learn to capture and filter effectively
•ss -ti exposes TCP internals (RTT, cwnd, retransmits) per connection without packet capture — it's the fastest way to spot TCP issues
•mtr combines traceroute and ping into a continuous path analysis — it reveals which hop is dropping packets or adding latency
•Most 'network issues' are actually application issues. Always check the application layer (curl -v, HTTP status codes) before diving into packets

Key Components

Component	Role
tcpdump / Wireshark	Packet capture and analysis — tcpdump for command-line capture on servers, Wireshark for visual deep-dive analysis
ss / netstat	Socket statistics showing connection states, window sizes, RTT, and retransmission counts per connection
mtr / traceroute	Path analysis showing every hop between source and destination, with per-hop latency and packet loss
dig / nslookup	DNS resolution debugging — query specific record types, trace delegation chain, verify TTL and propagation
curl -v / openssl s_client	HTTP and TLS debugging — verbose request/response headers, certificate chain verification, cipher negotiation

When to Use

Tool Comparison

Tool	Type	Best For	Scale
Wireshark	Open Source	Deep packet inspection with GUI — TCP stream reassembly, retransmission analysis, protocol dissection	Development-Production
tcpdump	Open Source	Command-line packet capture on remote servers — lightweight, available everywhere, scriptable	Any
mtr	Open Source	Continuous network path analysis combining traceroute and ping — shows per-hop loss and jitter	Any
netcat (nc)	Open Source	Quick connectivity tests — TCP/UDP port checks, simple client-server testing, banner grabbing	Any

Debug Checklist

Check if the port is open and the service is listening: ss -tlnp | grep :PORT — if nothing shows, the service isn't bound or isn't running
Test basic connectivity: nc -zv HOST PORT — this confirms whether the TCP handshake succeeds without any application protocol
Trace the network path: mtr -n --report HOST — look for hops with >1% packet loss or sudden latency jumps
Capture packets for detailed analysis: tcpdump -i any port PORT -w /tmp/capture.pcap -c 1000 — then open in Wireshark
Check TCP connection health: ss -ti dst HOST — look at rtt, retrans count, cwnd size, and whether the connection is in a healthy state
Verify DNS resolution: dig +trace HOSTNAME — follow the delegation chain from root servers to authoritative, checking for delays
Debug TLS issues: openssl s_client -connect HOST:443 -servername HOST — verify certificate validity, chain, and negotiated cipher
Debug HTTP layer: curl -v -o /dev/null https://HOST/path — inspect request/response headers, redirects, timing, and status codes

Common Mistakes

Capturing too many packets without filters. Always use tcpdump with port and host filters — an unfiltered capture on a busy server fills disk in seconds
Running traceroute once and drawing conclusions. Network paths fluctuate — use mtr with 100+ packets to get statistically meaningful results
Confusing ICMP-based traceroute results with actual TCP path behavior. Some routers rate-limit ICMP, showing false packet loss
Not checking both sides of the connection. A timeout might be the client not sending, the server not responding, or a middlebox dropping packets
Forgetting about firewalls and security groups. 'Connection refused' vs 'connection timed out' indicates whether a firewall is dropping (timeout) or rejecting (refused)

Real World Usage

•SRE teams use tcpdump to capture packets during incidents, then analyze offline in Wireshark to find retransmissions, resets, and connection failures
•Network engineers use mtr to diagnose path-specific issues when users in a specific region report slowness
•DevOps engineers use ss -tnp to find connection state accumulation (thousands of TIME_WAIT or CLOSE_WAIT sockets) during high-traffic events
•Security teams use tcpdump to verify that traffic between services is actually encrypted (TLS) and not leaking plaintext
•Platform teams use dig +trace to debug DNS propagation delays after zone changes or during DNS migration

RFCs & Specs

RFC 792 — ICMP (used by traceroute and ping)RFC 1035 — DNS Implementation and Specification (dig queries)RFC 8446 — TLS 1.3 (openssl debugging)RFC 9293 — TCP (core protocol being debugged)

The Problem

Mental Model

Architecture Diagram

How It Works

The Debugging Decision Tree

Tool 1: ss — Socket Statistics

Tool 2: tcpdump — Packet Capture

Tool 3: mtr — Path Analysis

Tool 4: dig — DNS Debugging

Tool 5: curl — HTTP Debugging

Tool 6: openssl s_client — TLS Debugging

Putting It All Together: A Real Debugging Session

Common Patterns and Their Diagnoses

Why This Matters

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics

The Problem

Mental Model

Architecture Diagram

How It Works

The Debugging Decision Tree

Tool 1: ss — Socket Statistics

Tool 2: tcpdump — Packet Capture

Tool 3: mtr — Path Analysis

Tool 4: dig — DNS Debugging

Tool 5: curl — HTTP Debugging

Tool 6: openssl s_client — TLS Debugging

Putting It All Together: A Real Debugging Session

Common Patterns and Their Diagnoses

Why This Matters

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics