XDP (eXpress Data Path)

Why It Exists

The Linux kernel networking stack is great for general-purpose networking. It handles TCP, UDP, routing, firewalling, connection tracking, and a hundred other things. But all that flexibility comes with overhead. Every packet that arrives at the NIC goes through a long journey: driver, socket buffer allocation, protocol processing, netfilter hooks, routing lookup, and eventually reaches the application through a syscall.

For most workloads, this is fine. For high packet-per-second workloads (telemetry pipelines, DDoS mitigation, load balancers, packet brokers), the kernel stack becomes the bottleneck. Not the NIC. Not the wire. The kernel's own processing overhead per packet.

The numbers tell the story. A typical telemetry data point, a metric sample or trace span, is under 1KB after serialization. Small packets are the worst case for kernel networking because the per-packet overhead dominates. The CPU spends more time managing each packet than actually transmitting it. A standard Linux box tops out around 1M small packets/sec through the normal send() path.

XDP solves this by running an eBPF program at the NIC driver level, before packets enter the kernel networking stack. Packets are intercepted at the earliest possible point with a decision on what to do: pass them through to the stack, drop them, bounce them back out the NIC, or redirect them to another interface or CPU. The kernel stack never sees the packets handled in XDP.

How It Works

When a packet arrives at the NIC, it normally goes through this path:

NIC → Driver → Allocate SKB → Netfilter → Routing → TCP/IP → Socket → Userspace App

With XDP attached, the packet hits the eBPF program right after the driver, before any of the expensive steps:

NIC → Driver → XDP eBPF Program → (one of four actions)

The four XDP actions:

XDP_PASS lets the packet continue through the normal kernel stack. The program inspected it and decided it is fine.
XDP_DROP silently drops the packet. The kernel never allocates memory for it. This is why XDP is so effective for DDoS mitigation. Cloudflare drops millions of malicious packets per second this way.
XDP_TX bounces the packet back out the same NIC it came in on. Useful for implementing a load balancer where the response goes back the same way.
XDP_REDIRECT sends the packet to a different NIC, a different CPU, or an AF_XDP socket in userspace. This is how high-performance packet forwarding is built.

Real Example: Telemetry Fast Path

An observability agent on a production server collects 5,000 metric samples per second via eBPF. Without XDP, shipping those metrics to the collector goes through the full kernel stack: 5,000 send() syscalls, 5,000 socket buffer allocations, 5,000 trips through TCP/IP processing.

With XDP, the agent batches metrics into packets and XDP redirects them to the NIC at driver level. The kernel networking stack is never involved. The same box that struggled at 1M packets/sec now handles 5-10M.

eBPF (collects metrics in kernel) → Agent batches into packets → XDP redirects at NIC driver → Wire → Collector

The agent does not need to be rewritten. XDP sits below it, accelerating the packet path transparently.

Three Execution Modes

Mode	How It Works	Performance	Requirement
Native	XDP runs inside the NIC driver itself	Best. 5-10M pps/core.	NIC driver must support XDP (most modern drivers do: i40e, mlx5, ixgbe, virtio-net)
Offloaded	XDP runs on the NIC hardware (SmartNIC)	Best possible. Zero CPU.	SmartNIC with eBPF offload support (Netronome, some Mellanox)
Generic	XDP runs after SKB allocation, faking the early hook	Slow. Defeats the purpose. Only useful for testing.	Any NIC, but the performance benefit is lost

Always deploy in native mode. If the NIC does not support native XDP, upgrade the NIC before deploying. Generic mode is a trap.

Production Considerations

Program size limits. eBPF verifier enforces a maximum instruction count (currently 1M instructions). Keep XDP programs small and focused. For complex logic, redirect the packet to userspace via AF_XDP and process it there.
Maps for state. XDP programs use eBPF maps (hash tables, arrays, ring buffers) to share state between the XDP program and userspace. Use per-CPU maps to avoid lock contention at high packet rates.
Testing. Use xdp-tools and bpftool to test XDP programs. Load them in generic mode first to validate correctness, then switch to native mode for production performance.
Monitoring. Track xdp_actions counters (pass/drop/tx/redirect) and xdp_errors. A sudden spike in errors means the program is hitting edge cases.
Kernel version. XDP has been in mainline Linux since 4.8, but features like XDP_REDIRECT and AF_XDP require 4.18+. Use 5.10+ for the best experience.

Failure Scenarios

Scenario 1: XDP Program Bug Drops Valid Traffic. An XDP program has a logic error in the filtering rules. Instead of dropping malicious packets, it drops 10% of legitimate traffic. Users see packet loss and connection timeouts. Nobody suspects XDP because the application and kernel metrics look fine. Detection: monitor xdp_drop counters alongside application error rates. If drops increase while application traffic is normal, the XDP program is the culprit. Recovery: detach the XDP program (ip link set dev eth0 xdp off), traffic immediately flows normally. Prevention: extensive testing with production traffic replays before deployment.

Scenario 2: NIC Driver Does Not Support Native XDP. An XDP program is deployed and gets native mode... on the development box. In production, the NIC driver does not have XDP support, so it silently falls back to generic mode. Performance is no better than the regular stack, but nobody notices because only functionality was checked, not throughput. The telemetry pipeline falls behind during peak. Detection: check ip link show for xdpgeneric vs xdp flag. Prevention: verify native XDP support on production NICs before deployment. Add a startup check that refuses to run in generic mode.

Scenario 3: eBPF Map Size Exhaustion. An XDP program uses a hash map to track per-connection state. The map has a fixed max_entries of 100K. During a traffic spike, connections exceed 100K. New entries fail silently. The XDP program cannot look up state for new connections and falls through to XDP_PASS, bypassing the filtering logic. Detection: monitor bpf_map_lookup_elem failure counters. Prevention: size maps for peak traffic, not average. Use LRU maps that evict old entries automatically.

Capacity Planning

Metric	Native XDP	Generic XDP	No XDP (kernel stack)
Packets/sec per core	5-10M	1-2M	0.5-1M
Latency per packet	1-5 μs	10-20 μs	20-50 μs
CPU overhead	Minimal (in-line processing)	Moderate	High (full stack traversal)
Memory per packet	Near zero (no SKB)	Full SKB allocation	Full SKB allocation

Real-world reference numbers: Cloudflare handles 10M+ packets/sec of DDoS traffic per server using XDP. Facebook's Katran load balancer serves billions of requests per day across their fleet using XDP. Cilium (Kubernetes CNI) uses XDP for pod-to-pod networking, achieving wire-speed packet forwarding between containers.

Sizing formula for telemetry pipelines: required_cores = (total_packets_per_sec / 6M). A 3,000-node fleet generating 5,000 telemetry packets/sec per node = 15M packets/sec total. With XDP on the collector, about 3 cores are needed for packet processing. Without XDP, 15+ cores are needed. That is a 5x reduction in CPU for the same throughput.

Architecture Decision Record

ADR: When to Use XDP vs tc/BPF vs iptables

Context: Packets need to be processed at high speed. Three options exist at different layers of the Linux networking stack.

Criteria (Weight)	XDP	tc/BPF	iptables/nftables
Packet rate (30%)	5-10M pps	2-4M pps	0.5-1M pps
Ease of use (20%)	Medium (eBPF required)	Medium (eBPF required)	Easy (rule syntax)
Stateful processing (20%)	Limited (eBPF maps)	Better (after stack)	Full (conntrack)
Feature richness (15%)	Minimal (4 actions)	Moderate	Rich (NAT, mangle, etc.)
Kernel version (15%)	4.8+ (5.10+ ideal)	4.1+	2.4+

Decision framework:

Less than 1M pps with NAT, conntrack, or complex rules needed. Use iptables/nftables. No reason to add eBPF complexity.
1-5M pps with classification, shaping, or post-stack processing needed. Use tc/BPF. It runs after SKB allocation, providing access to parsed protocol headers.
Over 5M pps or need to drop/redirect packets before the kernel stack. Use XDP. DDoS mitigation, high-throughput forwarding, telemetry fast path.
Over 20M pps or full kernel bypass is needed. XDP is not enough. Look at DPDK.

Tool	Type	Best For	Scale
XDP + eBPF	Open Source	Packet filtering, forwarding, and sampling at NIC driver level	Medium-Enterprise
tc/BPF	Open Source	Traffic shaping and classification after the kernel stack	Small-Enterprise
iptables / nftables	Open Source	Traditional firewall rules, simpler setups	Small-Medium
AF_XDP	Open Source	Zero-copy packet delivery from NIC to userspace applications	Medium-Enterprise

Tool	Type	Best For	Scale
XDP + eBPF	Open Source	Packet filtering, forwarding, and sampling at NIC driver level	Medium-Enterprise
tc/BPF	Open Source	Traffic shaping and classification after the kernel stack	Small-Enterprise
iptables / nftables	Open Source	Traditional firewall rules, simpler setups	Small-Medium
AF_XDP	Open Source	Zero-copy packet delivery from NIC to userspace applications	Medium-Enterprise

Architecture Diagram

Why It Exists

How It Works

Real Example: Telemetry Fast Path

Three Execution Modes

Production Considerations

Failure Scenarios

Capacity Planning

Architecture Decision Record

ADR: When to Use XDP vs tc/BPF vs iptables

Key Points

Tool Comparison

Common Mistakes

Related Topics

XDP (eXpress Data Path)

Architecture Diagram

Why It Exists

How It Works

Real Example: Telemetry Fast Path

Three Execution Modes

Production Considerations

Failure Scenarios

Capacity Planning

Architecture Decision Record

ADR: When to Use XDP vs tc/BPF vs iptables

Key Points

Tool Comparison

Common Mistakes

Related Topics