Security & EncryptionTopic 7 of 9

Security & EncryptionAdvanced

DDoS & Rate Limiting

TCPUDPHTTPDNSICMP

DDoS throws more traffic at a service than it can handle — defend with Anycast scrubbing at L3/L4, WAF rules at L7, and rate limiting at every layer.

The Problem

Any service exposed to the internet can be overwhelmed by malicious traffic. A single server can handle thousands of requests per second, but an attacker with a botnet can generate millions. Without layered defenses, even well-architected systems can be knocked offline by brute-force traffic volume or clever protocol exploitation.

Mental Model

Like a dam controlling water flow — different barriers at different points handle different types of floods. A mesh screen catches debris (L3/L4 filtering), gates control flow rate (rate limiting), and overflow channels handle surges (scrubbing centers). No single barrier handles all flood types.

Architecture Diagram

How It Works

DDoS (Distributed Denial of Service) attacks come in many forms, but they all share one goal: make a service unavailable to legitimate users. Understanding the attack taxonomy is essential because each type requires different defenses.

Attack Types by Layer

Layer 3/4: Volumetric and Protocol Attacks

These attacks target network bandwidth and transport-layer resources. They are measured in bits per second (bps) or packets per second (pps).

SYN Flood — The attacker sends millions of TCP SYN packets with spoofed source IPs. The server allocates resources for each half-open connection and sends SYN-ACK to an IP that never responds. The server's connection table fills up, and legitimate connections are rejected.

# Check for SYN flood — high number of SYN_RECV connections
ss -s
# If SYN_RECV count is abnormally high, the server is under SYN flood

# Enable SYN cookies to mitigate (Linux)
sudo sysctl -w net.ipv4.tcp_syncookies=1
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65536

UDP Amplification — The attacker sends small UDP requests to public servers (DNS, NTP, memcached) with the source IP spoofed to be the victim's IP. These servers respond with much larger replies directed at the victim. The amplification factor can be 50-500x.

Protocol	Amplification Factor	Request Size	Response Size
DNS	28-54x	64 bytes	3,400 bytes
NTP (monlist)	556x	234 bytes	130,000 bytes
Memcached	51,000x	15 bytes	750,000 bytes
SSDP	30x	29 bytes	870 bytes

ICMP Flood (Smurf Attack) — Flooding the target with ICMP echo requests. Largely mitigated by modern networks that disable directed broadcasts.

Layer 7: Application-Layer Attacks

These are the hardest to mitigate because each individual request looks legitimate. The attacker is not trying to fill the pipe — they are trying to exhaust application resources.

HTTP Flood — Thousands of bots make legitimate-looking HTTP requests to expensive endpoints. A login page that queries a database, a search endpoint that runs complex queries, or an API that triggers downstream microservice calls. Each request is valid; the volume is the weapon.

Slowloris — The attacker opens many HTTP connections and sends partial headers very slowly, never completing the request. The server keeps each connection open, waiting for the rest of the request. Eventually all connection slots are consumed, and the server cannot accept new connections.

# Detect Slowloris — look for many connections in established state from few IPs
ss -tn state established | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20

Credential Stuffing — Automated attempts to log in using leaked username/password combinations from other breaches. Each request hits the authentication endpoint with different credentials, making it hard to distinguish from legitimate logins.

DDoS Mitigation Architecture

Effective DDoS protection is layered. No single technology handles all attack types.

Layer 1: ISP and Transit — For massive volumetric attacks (multiple Tbps), the ISP or transit provider may need to blackhole traffic to the target IP prefix or reroute it through a scrubbing service. This is a last resort because it affects all traffic, not just attack traffic.

Layer 2: Anycast Scrubbing — Services like Cloudflare, Akamai Prolexic, and AWS Shield distribute incoming traffic across hundreds of Points of Presence (PoPs) worldwide. Each PoP absorbs a fraction of the attack, and collectively they can handle attacks in the hundreds of Tbps. Malicious packets are dropped; clean traffic is forwarded to the origin.

Layer 3: Edge WAF — Web Application Firewalls at the edge inspect HTTP traffic for attack patterns. They can rate limit specific endpoints, block known bad user agents, enforce request size limits, and use JavaScript challenges or CAPTCHAs to distinguish bots from humans.

Layer 4: Application Rate Limiting — The application enforces per-client quotas using rate limiting algorithms.

Rate Limiting Algorithms

Rate limiting is not just for DDoS — it protects APIs from abuse, prevents cost overruns, and ensures fair resource allocation among clients.

Token Bucket

The most widely used algorithm. Imagine a bucket that holds N tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected (or queued).

Key property: allows bursts. If the bucket holds 100 tokens and fills at 10/sec, a client can burst 100 requests instantly, then sustain 10/sec.

# Token bucket implementation using Redis
import time
import redis

r = redis.Redis()

def is_allowed(client_id, rate=10, capacity=100):
    key = f"ratelimit:{client_id}"
    now = time.time()
    
    pipe = r.pipeline()
    # Lua script for atomic token bucket
    lua = """
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
    
    local bucket = redis.call('hmget', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now
    
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * rate)
    
    if tokens >= 1 then
        tokens = tokens - 1
        redis.call('hmset', key, 'tokens', tokens, 'last_refill', now)
        redis.call('expire', key, math.ceil(capacity / rate) * 2)
        return 1
    else
        return 0
    end
    """
    result = r.eval(lua, 1, key, capacity, rate, now)
    return result == 1

Sliding Window Log

Stores the timestamp of every request in a sorted set. To check the rate, count entries within the current window. Precise but memory-intensive for high-volume APIs.

Sliding Window Counter

Combines the fixed window approach with interpolation. Keeps counters for the current and previous window. The rate estimate is: previous_window_count * overlap_percentage + current_window_count. This gives a smooth rate estimate without storing individual timestamps.

Fixed Window Counter

The simplest approach: count requests in fixed time intervals (e.g., 60-second windows). Problem: a client can send the full limit at the end of one window and the full limit at the start of the next, effectively doubling the rate at window boundaries.

Algorithm	Burst Handling	Memory	Accuracy	Best For
Token Bucket	Allows controlled bursts	Low	Good	API rate limiting, general purpose
Sliding Window Log	Strict, no bursts	High	Exact	Low-volume, high-precision
Sliding Window Counter	Smooth approximation	Low	Good	High-volume APIs
Fixed Window	Double burst at boundary	Very Low	Poor	Simple use cases only

Rate Limit Response Headers

When rate limiting, communicate the limits to clients through standard headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1714003200
Retry-After: 30

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714003200
Retry-After: 30

Return 429 Too Many Requests when the limit is exceeded. Include Retry-After to tell well-behaved clients when to try again.

Building a DDoS Response Runbook

When an attack hits, a pre-written plan is essential. Here is a template:

1. DETECT: Alert fires on traffic spike (>5x baseline) or error rate (>10% 5xx)
2. CLASSIFY: Determine attack type
   - Check network bandwidth utilization (L3/L4 volumetric?)
   - Check connection counts (SYN flood? Slowloris?)
   - Check request rates per endpoint (L7 HTTP flood?)
3. MITIGATE:
   - L3/L4: Enable Cloudflare "I'm Under Attack" mode / activate AWS Shield Advanced
   - L7: Deploy WAF rules to block attack patterns
   - Application: Tighten rate limits, enable CAPTCHAs on targeted endpoints
4. MONITOR: Watch attack metrics for adaptation
5. RECOVER: Gradually relax mitigations after attack subsides
6. POST-MORTEM: Document attack vector, timeline, and defense improvements

Linux Kernel Hardening for SYN Floods

# Enable SYN cookies — the kernel does not allocate resources for SYN_RECV
sudo sysctl -w net.ipv4.tcp_syncookies=1

# Increase the SYN backlog
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65536

# Reduce SYN-ACK retries (faster timeout of half-open connections)
sudo sysctl -w net.ipv4.tcp_synack_retries=2

# Connection rate limiting with iptables
sudo iptables -A INPUT -p tcp --syn -m limit --limit 50/s --limit-burst 100 -j ACCEPT
sudo iptables -A INPUT -p tcp --syn -j DROP

The reality of DDoS defense is that no single measure is sufficient. Effective protection requires layered defenses — Anycast absorption for volumetric attacks, protocol-level mitigations for SYN floods, WAF rules for application-layer attacks, and rate limiting for abuse prevention. The time to set this up is before the attack, not during it.

Key Points

•DDoS attacks operate at different layers and require layer-specific defenses. A single firewall cannot protect against all types.
•Volumetric attacks are the largest (measured in Tbps) but the easiest to mitigate with Anycast and scrubbing centers.
•Application-layer attacks are the hardest to mitigate because each request looks legitimate — effective defense requires behavioral analysis.
•Rate limiting is not just for DDoS. It protects against accidental traffic spikes, misbehaving clients, and cost overruns.
•The token bucket algorithm is the most widely used rate limiter because it allows bursts while enforcing an average rate.

Key Components

Component	Role
Volumetric Attacks (L3/L4)	Flood the target's bandwidth with massive traffic volume — UDP amplification, DNS reflection, SYN floods
Protocol Attacks (L4)	Exploit protocol weaknesses to exhaust server resources — SYN floods, Slowloris, fragmentation attacks
Application-Layer Attacks (L7)	Target specific application endpoints with legitimate-looking requests — HTTP floods, credential stuffing
Rate Limiting Engine	Enforces request quotas per client using algorithms like token bucket or sliding window to prevent abuse
Anycast Scrubbing Center	Distributed network of PoPs that absorb attack traffic, filter malicious packets, and forward clean traffic to origin

When to Use

Every public-facing service needs rate limiting. Any service handling significant traffic needs L7 DDoS protection (WAF). High-value targets (financial services, gaming, SaaS) need full L3-L7 DDoS mitigation with a dedicated provider. Implement defense in depth — never rely on a single layer.

Tool Comparison

Tool	Type	Best For	Scale
Cloudflare	Managed	Global Anycast network with L3-L7 DDoS protection, WAF, and bot management	Enterprise
AWS Shield + WAF	Managed	AWS-native DDoS protection (Shield Standard free, Advanced with SLA) paired with WAF rules	Enterprise
Akamai Prolexic	Commercial	Dedicated DDoS scrubbing with BGP rerouting for the largest volumetric attacks	Enterprise
fail2ban	Open Source	Host-level intrusion prevention that bans IPs based on log patterns (SSH brute force, HTTP abuse)	Small

Debug Checklist

Monitor request rates by IP, endpoint, and user-agent to detect abnormal spikes before they become outages.
Check if rate limit headers are being returned correctly: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.
Verify SYN cookies are enabled on Linux: sysctl net.ipv4.tcp_syncookies — this mitigates SYN floods at the kernel level.
Test rate limiting with load testing tools: hey -n 1000 -c 50 https://api.example.com/endpoint.
Review DDoS mitigation dashboards (Cloudflare Analytics, AWS Shield console) for attack patterns and blocked traffic.

Common Mistakes

Implementing rate limiting only at the application level, missing attacks that overwhelm the network or transport layer.
Using a fixed window rate limiter that allows double the limit at window boundaries — use sliding window instead.
Rate limiting by IP address only, which punishes users behind NAT/proxies sharing an IP and is bypassed by botnets with millions of IPs.
Setting rate limits too high to be useful or too low and blocking legitimate traffic — test with production traffic patterns first.
Not having a DDoS response runbook. When an attack hits, it is too late to figure out who to call and what buttons to push.

Real World Usage

•Cloudflare mitigated a 71 million RPS HTTP DDoS attack in 2023, the largest ever recorded at the time.
•AWS Shield Advanced protected Amazon's own infrastructure during a 2.3 Tbps DDoS attack in 2020.
•GitHub survived a 1.35 Tbps memcached amplification attack in 2018 by routing traffic through Akamai Prolexic.
•Google Cloud Armor blocked a 46 million RPS L7 DDoS attack against a Google Cloud customer in 2022.
•Stripe uses multi-layered rate limiting — per-IP, per-API-key, and per-endpoint — to protect payment APIs.

RFCs & Specs

RFC 4732 — Internet Denial-of-Service ConsiderationsRFC 6585 — 429 Too Many RequestsRFC 7665 — Service Function Chainingdraft-ietf-httpapi-ratelimit-headers — RateLimit Header Fields

DDoS & Rate Limiting

TCPUDPHTTPDNSICMP

DDoS throws more traffic at a service than it can handle — defend with Anycast scrubbing at L3/L4, WAF rules at L7, and rate limiting at every layer.

The Problem

Mental Model

Architecture Diagram

How It Works

Attack Types by Layer

Layer 3/4: Volumetric and Protocol Attacks

These attacks target network bandwidth and transport-layer resources. They are measured in bits per second (bps) or packets per second (pps).

# Check for SYN flood — high number of SYN_RECV connections
ss -s
# If SYN_RECV count is abnormally high, the server is under SYN flood

# Enable SYN cookies to mitigate (Linux)
sudo sysctl -w net.ipv4.tcp_syncookies=1
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65536

Protocol	Amplification Factor	Request Size	Response Size
DNS	28-54x	64 bytes	3,400 bytes
NTP (monlist)	556x	234 bytes	130,000 bytes
Memcached	51,000x	15 bytes	750,000 bytes
SSDP	30x	29 bytes	870 bytes

ICMP Flood (Smurf Attack) — Flooding the target with ICMP echo requests. Largely mitigated by modern networks that disable directed broadcasts.

Layer 7: Application-Layer Attacks

These are the hardest to mitigate because each individual request looks legitimate. The attacker is not trying to fill the pipe — they are trying to exhaust application resources.

# Detect Slowloris — look for many connections in established state from few IPs
ss -tn state established | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20

DDoS Mitigation Architecture

Effective DDoS protection is layered. No single technology handles all attack types.

Layer 4: Application Rate Limiting — The application enforces per-client quotas using rate limiting algorithms.

Rate Limiting Algorithms

Rate limiting is not just for DDoS — it protects APIs from abuse, prevents cost overruns, and ensures fair resource allocation among clients.

Token Bucket

Key property: allows bursts. If the bucket holds 100 tokens and fills at 10/sec, a client can burst 100 requests instantly, then sustain 10/sec.

# Token bucket implementation using Redis
import time
import redis

r = redis.Redis()

def is_allowed(client_id, rate=10, capacity=100):
    key = f"ratelimit:{client_id}"
    now = time.time()
    
    pipe = r.pipeline()
    # Lua script for atomic token bucket
    lua = """
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
    
    local bucket = redis.call('hmget', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now
    
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * rate)
    
    if tokens >= 1 then
        tokens = tokens - 1
        redis.call('hmset', key, 'tokens', tokens, 'last_refill', now)
        redis.call('expire', key, math.ceil(capacity / rate) * 2)
        return 1
    else
        return 0
    end
    """
    result = r.eval(lua, 1, key, capacity, rate, now)
    return result == 1

Sliding Window Log

Stores the timestamp of every request in a sorted set. To check the rate, count entries within the current window. Precise but memory-intensive for high-volume APIs.

Sliding Window Counter

Fixed Window Counter

Algorithm	Burst Handling	Memory	Accuracy	Best For
Token Bucket	Allows controlled bursts	Low	Good	API rate limiting, general purpose
Sliding Window Log	Strict, no bursts	High	Exact	Low-volume, high-precision
Sliding Window Counter	Smooth approximation	Low	Good	High-volume APIs
Fixed Window	Double burst at boundary	Very Low	Poor	Simple use cases only

Rate Limit Response Headers

When rate limiting, communicate the limits to clients through standard headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1714003200
Retry-After: 30

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714003200
Retry-After: 30

Return 429 Too Many Requests when the limit is exceeded. Include Retry-After to tell well-behaved clients when to try again.

Building a DDoS Response Runbook

When an attack hits, a pre-written plan is essential. Here is a template:

1. DETECT: Alert fires on traffic spike (>5x baseline) or error rate (>10% 5xx)
2. CLASSIFY: Determine attack type
   - Check network bandwidth utilization (L3/L4 volumetric?)
   - Check connection counts (SYN flood? Slowloris?)
   - Check request rates per endpoint (L7 HTTP flood?)
3. MITIGATE:
   - L3/L4: Enable Cloudflare "I'm Under Attack" mode / activate AWS Shield Advanced
   - L7: Deploy WAF rules to block attack patterns
   - Application: Tighten rate limits, enable CAPTCHAs on targeted endpoints
4. MONITOR: Watch attack metrics for adaptation
5. RECOVER: Gradually relax mitigations after attack subsides
6. POST-MORTEM: Document attack vector, timeline, and defense improvements

Linux Kernel Hardening for SYN Floods

# Enable SYN cookies — the kernel does not allocate resources for SYN_RECV
sudo sysctl -w net.ipv4.tcp_syncookies=1

# Increase the SYN backlog
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65536

# Reduce SYN-ACK retries (faster timeout of half-open connections)
sudo sysctl -w net.ipv4.tcp_synack_retries=2

# Connection rate limiting with iptables
sudo iptables -A INPUT -p tcp --syn -m limit --limit 50/s --limit-burst 100 -j ACCEPT
sudo iptables -A INPUT -p tcp --syn -j DROP

Key Points

•DDoS attacks operate at different layers and require layer-specific defenses. A single firewall cannot protect against all types.
•Volumetric attacks are the largest (measured in Tbps) but the easiest to mitigate with Anycast and scrubbing centers.
•Application-layer attacks are the hardest to mitigate because each request looks legitimate — effective defense requires behavioral analysis.
•Rate limiting is not just for DDoS. It protects against accidental traffic spikes, misbehaving clients, and cost overruns.
•The token bucket algorithm is the most widely used rate limiter because it allows bursts while enforcing an average rate.

Key Components

Component	Role
Volumetric Attacks (L3/L4)	Flood the target's bandwidth with massive traffic volume — UDP amplification, DNS reflection, SYN floods
Protocol Attacks (L4)	Exploit protocol weaknesses to exhaust server resources — SYN floods, Slowloris, fragmentation attacks
Application-Layer Attacks (L7)	Target specific application endpoints with legitimate-looking requests — HTTP floods, credential stuffing
Rate Limiting Engine	Enforces request quotas per client using algorithms like token bucket or sliding window to prevent abuse
Anycast Scrubbing Center	Distributed network of PoPs that absorb attack traffic, filter malicious packets, and forward clean traffic to origin

When to Use

Tool Comparison

Tool	Type	Best For	Scale
Cloudflare	Managed	Global Anycast network with L3-L7 DDoS protection, WAF, and bot management	Enterprise
AWS Shield + WAF	Managed	AWS-native DDoS protection (Shield Standard free, Advanced with SLA) paired with WAF rules	Enterprise
Akamai Prolexic	Commercial	Dedicated DDoS scrubbing with BGP rerouting for the largest volumetric attacks	Enterprise
fail2ban	Open Source	Host-level intrusion prevention that bans IPs based on log patterns (SSH brute force, HTTP abuse)	Small

Debug Checklist

Monitor request rates by IP, endpoint, and user-agent to detect abnormal spikes before they become outages.
Check if rate limit headers are being returned correctly: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.
Verify SYN cookies are enabled on Linux: sysctl net.ipv4.tcp_syncookies — this mitigates SYN floods at the kernel level.
Test rate limiting with load testing tools: hey -n 1000 -c 50 https://api.example.com/endpoint.
Review DDoS mitigation dashboards (Cloudflare Analytics, AWS Shield console) for attack patterns and blocked traffic.

Common Mistakes

Implementing rate limiting only at the application level, missing attacks that overwhelm the network or transport layer.
Using a fixed window rate limiter that allows double the limit at window boundaries — use sliding window instead.
Rate limiting by IP address only, which punishes users behind NAT/proxies sharing an IP and is bypassed by botnets with millions of IPs.
Setting rate limits too high to be useful or too low and blocking legitimate traffic — test with production traffic patterns first.
Not having a DDoS response runbook. When an attack hits, it is too late to figure out who to call and what buttons to push.

Real World Usage

•Cloudflare mitigated a 71 million RPS HTTP DDoS attack in 2023, the largest ever recorded at the time.
•AWS Shield Advanced protected Amazon's own infrastructure during a 2.3 Tbps DDoS attack in 2020.
•GitHub survived a 1.35 Tbps memcached amplification attack in 2018 by routing traffic through Akamai Prolexic.
•Google Cloud Armor blocked a 46 million RPS L7 DDoS attack against a Google Cloud customer in 2022.
•Stripe uses multi-layered rate limiting — per-IP, per-API-key, and per-endpoint — to protect payment APIs.

RFCs & Specs

RFC 4732 — Internet Denial-of-Service ConsiderationsRFC 6585 — 429 Too Many RequestsRFC 7665 — Service Function Chainingdraft-ietf-httpapi-ratelimit-headers — RateLimit Header Fields

The Problem

Mental Model

Architecture Diagram

How It Works

Attack Types by Layer

Layer 3/4: Volumetric and Protocol Attacks

Layer 7: Application-Layer Attacks

DDoS Mitigation Architecture

Rate Limiting Algorithms

Token Bucket

Sliding Window Log

Sliding Window Counter

Fixed Window Counter

Rate Limit Response Headers

Building a DDoS Response Runbook

Linux Kernel Hardening for SYN Floods

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics

The Problem

Mental Model

Architecture Diagram

How It Works

Attack Types by Layer

Layer 3/4: Volumetric and Protocol Attacks

Layer 7: Application-Layer Attacks

DDoS Mitigation Architecture

Rate Limiting Algorithms

Token Bucket

Sliding Window Log

Sliding Window Counter

Fixed Window Counter

Rate Limit Response Headers

Building a DDoS Response Runbook

Linux Kernel Hardening for SYN Floods

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics