DDoS & Rate Limiting
DDoS throws more traffic at a service than it can handle — defend with Anycast scrubbing at L3/L4, WAF rules at L7, and rate limiting at every layer.
The Problem
Any service exposed to the internet can be overwhelmed by malicious traffic. A single server can handle thousands of requests per second, but an attacker with a botnet can generate millions. Without layered defenses, even well-architected systems can be knocked offline by brute-force traffic volume or clever protocol exploitation.
Mental Model
Like a dam controlling water flow — different barriers at different points handle different types of floods. A mesh screen catches debris (L3/L4 filtering), gates control flow rate (rate limiting), and overflow channels handle surges (scrubbing centers). No single barrier handles all flood types.
Architecture Diagram
How It Works
DDoS (Distributed Denial of Service) attacks come in many forms, but they all share one goal: make a service unavailable to legitimate users. Understanding the attack taxonomy is essential because each type requires different defenses.
Attack Types by Layer
Layer 3/4: Volumetric and Protocol Attacks
These attacks target network bandwidth and transport-layer resources. They are measured in bits per second (bps) or packets per second (pps).
SYN Flood — The attacker sends millions of TCP SYN packets with spoofed source IPs. The server allocates resources for each half-open connection and sends SYN-ACK to an IP that never responds. The server's connection table fills up, and legitimate connections are rejected.
# Check for SYN flood — high number of SYN_RECV connections
ss -s
# If SYN_RECV count is abnormally high, the server is under SYN flood
# Enable SYN cookies to mitigate (Linux)
sudo sysctl -w net.ipv4.tcp_syncookies=1
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65536
UDP Amplification — The attacker sends small UDP requests to public servers (DNS, NTP, memcached) with the source IP spoofed to be the victim's IP. These servers respond with much larger replies directed at the victim. The amplification factor can be 50-500x.
| Protocol | Amplification Factor | Request Size | Response Size |
|---|---|---|---|
| DNS | 28-54x | 64 bytes | 3,400 bytes |
| NTP (monlist) | 556x | 234 bytes | 130,000 bytes |
| Memcached | 51,000x | 15 bytes | 750,000 bytes |
| SSDP | 30x | 29 bytes | 870 bytes |
ICMP Flood (Smurf Attack) — Flooding the target with ICMP echo requests. Largely mitigated by modern networks that disable directed broadcasts.
Layer 7: Application-Layer Attacks
These are the hardest to mitigate because each individual request looks legitimate. The attacker is not trying to fill the pipe — they are trying to exhaust application resources.
HTTP Flood — Thousands of bots make legitimate-looking HTTP requests to expensive endpoints. A login page that queries a database, a search endpoint that runs complex queries, or an API that triggers downstream microservice calls. Each request is valid; the volume is the weapon.
Slowloris — The attacker opens many HTTP connections and sends partial headers very slowly, never completing the request. The server keeps each connection open, waiting for the rest of the request. Eventually all connection slots are consumed, and the server cannot accept new connections.
# Detect Slowloris — look for many connections in established state from few IPs
ss -tn state established | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20
Credential Stuffing — Automated attempts to log in using leaked username/password combinations from other breaches. Each request hits the authentication endpoint with different credentials, making it hard to distinguish from legitimate logins.
DDoS Mitigation Architecture
Effective DDoS protection is layered. No single technology handles all attack types.
Layer 1: ISP and Transit — For massive volumetric attacks (multiple Tbps), the ISP or transit provider may need to blackhole traffic to the target IP prefix or reroute it through a scrubbing service. This is a last resort because it affects all traffic, not just attack traffic.
Layer 2: Anycast Scrubbing — Services like Cloudflare, Akamai Prolexic, and AWS Shield distribute incoming traffic across hundreds of Points of Presence (PoPs) worldwide. Each PoP absorbs a fraction of the attack, and collectively they can handle attacks in the hundreds of Tbps. Malicious packets are dropped; clean traffic is forwarded to the origin.
Layer 3: Edge WAF — Web Application Firewalls at the edge inspect HTTP traffic for attack patterns. They can rate limit specific endpoints, block known bad user agents, enforce request size limits, and use JavaScript challenges or CAPTCHAs to distinguish bots from humans.
Layer 4: Application Rate Limiting — The application enforces per-client quotas using rate limiting algorithms.
Rate Limiting Algorithms
Rate limiting is not just for DDoS — it protects APIs from abuse, prevents cost overruns, and ensures fair resource allocation among clients.
Token Bucket
The most widely used algorithm. Imagine a bucket that holds N tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected (or queued).
Key property: allows bursts. If the bucket holds 100 tokens and fills at 10/sec, a client can burst 100 requests instantly, then sustain 10/sec.
# Token bucket implementation using Redis
import time
import redis
r = redis.Redis()
def is_allowed(client_id, rate=10, capacity=100):
key = f"ratelimit:{client_id}"
now = time.time()
pipe = r.pipeline()
# Lua script for atomic token bucket
lua = """
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local bucket = redis.call('hmget', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * rate)
if tokens >= 1 then
tokens = tokens - 1
redis.call('hmset', key, 'tokens', tokens, 'last_refill', now)
redis.call('expire', key, math.ceil(capacity / rate) * 2)
return 1
else
return 0
end
"""
result = r.eval(lua, 1, key, capacity, rate, now)
return result == 1
Sliding Window Log
Stores the timestamp of every request in a sorted set. To check the rate, count entries within the current window. Precise but memory-intensive for high-volume APIs.
Sliding Window Counter
Combines the fixed window approach with interpolation. Keeps counters for the current and previous window. The rate estimate is: previous_window_count * overlap_percentage + current_window_count. This gives a smooth rate estimate without storing individual timestamps.
Fixed Window Counter
The simplest approach: count requests in fixed time intervals (e.g., 60-second windows). Problem: a client can send the full limit at the end of one window and the full limit at the start of the next, effectively doubling the rate at window boundaries.
| Algorithm | Burst Handling | Memory | Accuracy | Best For |
|---|---|---|---|---|
| Token Bucket | Allows controlled bursts | Low | Good | API rate limiting, general purpose |
| Sliding Window Log | Strict, no bursts | High | Exact | Low-volume, high-precision |
| Sliding Window Counter | Smooth approximation | Low | Good | High-volume APIs |
| Fixed Window | Double burst at boundary | Very Low | Poor | Simple use cases only |
Rate Limit Response Headers
When rate limiting, communicate the limits to clients through standard headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1714003200
Retry-After: 30
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714003200
Retry-After: 30
Return 429 Too Many Requests when the limit is exceeded. Include Retry-After to tell well-behaved clients when to try again.
Building a DDoS Response Runbook
When an attack hits, a pre-written plan is essential. Here is a template:
1. DETECT: Alert fires on traffic spike (>5x baseline) or error rate (>10% 5xx)
2. CLASSIFY: Determine attack type
- Check network bandwidth utilization (L3/L4 volumetric?)
- Check connection counts (SYN flood? Slowloris?)
- Check request rates per endpoint (L7 HTTP flood?)
3. MITIGATE:
- L3/L4: Enable Cloudflare "I'm Under Attack" mode / activate AWS Shield Advanced
- L7: Deploy WAF rules to block attack patterns
- Application: Tighten rate limits, enable CAPTCHAs on targeted endpoints
4. MONITOR: Watch attack metrics for adaptation
5. RECOVER: Gradually relax mitigations after attack subsides
6. POST-MORTEM: Document attack vector, timeline, and defense improvements
Linux Kernel Hardening for SYN Floods
# Enable SYN cookies — the kernel does not allocate resources for SYN_RECV
sudo sysctl -w net.ipv4.tcp_syncookies=1
# Increase the SYN backlog
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65536
# Reduce SYN-ACK retries (faster timeout of half-open connections)
sudo sysctl -w net.ipv4.tcp_synack_retries=2
# Connection rate limiting with iptables
sudo iptables -A INPUT -p tcp --syn -m limit --limit 50/s --limit-burst 100 -j ACCEPT
sudo iptables -A INPUT -p tcp --syn -j DROP
The reality of DDoS defense is that no single measure is sufficient. Effective protection requires layered defenses — Anycast absorption for volumetric attacks, protocol-level mitigations for SYN floods, WAF rules for application-layer attacks, and rate limiting for abuse prevention. The time to set this up is before the attack, not during it.
Key Points
- •DDoS attacks operate at different layers and require layer-specific defenses. A single firewall cannot protect against all types.
- •Volumetric attacks are the largest (measured in Tbps) but the easiest to mitigate with Anycast and scrubbing centers.
- •Application-layer attacks are the hardest to mitigate because each request looks legitimate — effective defense requires behavioral analysis.
- •Rate limiting is not just for DDoS. It protects against accidental traffic spikes, misbehaving clients, and cost overruns.
- •The token bucket algorithm is the most widely used rate limiter because it allows bursts while enforcing an average rate.
Key Components
| Component | Role |
|---|---|
| Volumetric Attacks (L3/L4) | Flood the target's bandwidth with massive traffic volume — UDP amplification, DNS reflection, SYN floods |
| Protocol Attacks (L4) | Exploit protocol weaknesses to exhaust server resources — SYN floods, Slowloris, fragmentation attacks |
| Application-Layer Attacks (L7) | Target specific application endpoints with legitimate-looking requests — HTTP floods, credential stuffing |
| Rate Limiting Engine | Enforces request quotas per client using algorithms like token bucket or sliding window to prevent abuse |
| Anycast Scrubbing Center | Distributed network of PoPs that absorb attack traffic, filter malicious packets, and forward clean traffic to origin |
When to Use
Every public-facing service needs rate limiting. Any service handling significant traffic needs L7 DDoS protection (WAF). High-value targets (financial services, gaming, SaaS) need full L3-L7 DDoS mitigation with a dedicated provider. Implement defense in depth — never rely on a single layer.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| Cloudflare | Managed | Global Anycast network with L3-L7 DDoS protection, WAF, and bot management | Enterprise |
| AWS Shield + WAF | Managed | AWS-native DDoS protection (Shield Standard free, Advanced with SLA) paired with WAF rules | Enterprise |
| Akamai Prolexic | Commercial | Dedicated DDoS scrubbing with BGP rerouting for the largest volumetric attacks | Enterprise |
| fail2ban | Open Source | Host-level intrusion prevention that bans IPs based on log patterns (SSH brute force, HTTP abuse) | Small |
Debug Checklist
- Monitor request rates by IP, endpoint, and user-agent to detect abnormal spikes before they become outages.
- Check if rate limit headers are being returned correctly: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.
- Verify SYN cookies are enabled on Linux: sysctl net.ipv4.tcp_syncookies — this mitigates SYN floods at the kernel level.
- Test rate limiting with load testing tools: hey -n 1000 -c 50 https://api.example.com/endpoint.
- Review DDoS mitigation dashboards (Cloudflare Analytics, AWS Shield console) for attack patterns and blocked traffic.
Common Mistakes
- Implementing rate limiting only at the application level, missing attacks that overwhelm the network or transport layer.
- Using a fixed window rate limiter that allows double the limit at window boundaries — use sliding window instead.
- Rate limiting by IP address only, which punishes users behind NAT/proxies sharing an IP and is bypassed by botnets with millions of IPs.
- Setting rate limits too high to be useful or too low and blocking legitimate traffic — test with production traffic patterns first.
- Not having a DDoS response runbook. When an attack hits, it is too late to figure out who to call and what buttons to push.
Real World Usage
- •Cloudflare mitigated a 71 million RPS HTTP DDoS attack in 2023, the largest ever recorded at the time.
- •AWS Shield Advanced protected Amazon's own infrastructure during a 2.3 Tbps DDoS attack in 2020.
- •GitHub survived a 1.35 Tbps memcached amplification attack in 2018 by routing traffic through Akamai Prolexic.
- •Google Cloud Armor blocked a 46 million RPS L7 DDoS attack against a Google Cloud customer in 2022.
- •Stripe uses multi-layered rate limiting — per-IP, per-API-key, and per-endpoint — to protect payment APIs.