eBPF for Networking
eBPF runs verified, JIT-compiled programs at kernel hook points — replacing iptables with programmable, O(1) packet processing for load balancing, security, and observability.
The Problem
Linux networking traditionally relies on iptables for packet filtering and routing — a chain of rules evaluated sequentially for every packet. At scale (10,000+ Kubernetes services with 100,000+ iptables rules), this becomes a performance disaster: O(n) rule evaluation per packet, slow updates that lock the entire table, and no programmability beyond match/action. eBPF replaces this with programmable, JIT-compiled code that runs at specific kernel hook points with O(1) lookups via hash maps.
Mental Model
Like programmable speed bumps on a road — custom logic runs at specific checkpoints without rebuilding the entire road (kernel). Each speed bump can inspect vehicles, redirect traffic, or collect statistics, and what they do changes without shutting down the road.
Architecture Diagram
How It Works
eBPF (extended Berkeley Packet Filter) is a technology for running custom programs inside the Linux kernel without writing kernel modules or recompiling the kernel. Originally designed for packet filtering (hence the name), it has evolved into a general-purpose in-kernel virtual machine. For networking, this is revolutionary: it intercepts packets at multiple points in the kernel, makes decisions at near-native speed, and updates logic on the fly.
The key insight is where eBPF programs run. Traditional packet processing in Linux flows through the kernel network stack — a series of fixed stages (driver, IP layer, transport layer, socket layer) with iptables rules evaluated at specific checkpoints. eBPF hooks into this pipeline at strategic points, making it possible to short-circuit the entire stack when needed.
The eBPF Execution Model
An eBPF program goes through three stages before it runs:
1. Compilation: The program is written in C (or Rust, via Aya) and compiled to eBPF bytecode. This is not x86 machine code — it is a custom instruction set designed for safe kernel execution.
2. Verification: Before the kernel loads the program, the eBPF verifier performs static analysis. It walks every possible execution path, checking that the program terminates (no infinite loops), never accesses memory outside its allowed bounds, and uses only approved kernel helper functions. If verification fails, the program is rejected. This is the safety guarantee — a buggy eBPF program cannot crash the kernel.
3. JIT Compilation: After verification, the kernel JIT-compiles the eBPF bytecode to native machine code (x86, ARM). From this point, execution is near-native speed.
// Minimal XDP program that drops all traffic from a specific IP
SEC("xdp")
int xdp_drop_ip(struct xdp_md *ctx) {
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end) return XDP_PASS;
if (eth->h_proto != htons(ETH_P_IP)) return XDP_PASS;
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end) return XDP_PASS;
// Drop packets from 10.0.0.100
if (ip->saddr == htonl(0x0A000064)) return XDP_DROP;
return XDP_PASS;
}
XDP: Processing Packets at the Edge
XDP (eXpress Data Path) is the earliest point where an eBPF program can attach — at the NIC driver level, before the kernel allocates an sk_buff (socket buffer) structure. This matters enormously for performance because sk_buff allocation is expensive. By processing packets before this allocation, XDP can handle 10M+ packets per second on a single core.
XDP programs return one of three verdicts:
- XDP_DROP: Discard the packet immediately. The kernel never sees it. This is how DDoS mitigation works at line rate.
- XDP_TX: Bounce the packet back out the same NIC. Used for load balancers that modify the destination and reflect packets without kernel stack involvement.
- XDP_PASS: Let the packet continue through the normal kernel network stack.
- XDP_REDIRECT: Send the packet to a different NIC or CPU, enabling cross-interface forwarding.
eBPF vs iptables: A Performance Showdown
This is the practical comparison that matters for Kubernetes networking. In a Kubernetes cluster, every Service creates iptables rules. The kube-proxy component maintains these rules — one DNAT rule per service endpoint. Here is what happens at scale:
| Metric | iptables (kube-proxy) | eBPF (Cilium) |
|---|---|---|
| Rule lookup per packet | O(n) — sequential chain evaluation | O(1) — hash map lookup |
| Time to add a service | 5-10s with 10K rules (table lock) | < 100ms (map update) |
| Memory per service | ~300 bytes per iptables rule | ~64 bytes per map entry |
| CPU overhead at 10K services | Significant — every packet walks 10K+ rules | Negligible — single hash lookup |
| Connection tracking | conntrack module (separate from rules) | Built into eBPF program |
| Latency added per packet | 1-5ms at scale | < 100us |
At 5,000 Kubernetes services with 3 endpoints each, that is ~15,000 iptables rules. Every incoming packet evaluates these rules sequentially. With eBPF, the same service lookup is a single hash table access regardless of how many services exist.
# See how many iptables rules exist (this number grows with services)
iptables -t nat -L | wc -l
# With Cilium, see eBPF-based service entries instead
cilium bpf lb list
TC Hooks: L7-Aware Processing
While XDP operates at L3/L4 (IP headers, TCP/UDP ports), TC (Traffic Control) hooks run later in the stack after the kernel has parsed more headers. TC eBPF programs can:
- Enforce Kubernetes NetworkPolicies by matching on pod labels (translated to IP addresses via eBPF maps)
- Perform L7 protocol parsing for HTTP, DNS, and Kafka traffic
- Implement per-pod bandwidth limiting
- Rewrite packet headers for NAT and encapsulation
Cilium attaches eBPF programs at both XDP and TC hooks, creating a complete networking data plane:
Ingress path:
NIC → [XDP: DDoS filter, LB] → Kernel → [TC ingress: NetworkPolicy, L7] → Pod
Egress path:
Pod → [TC egress: NetworkPolicy, NAT] → Kernel → NIC
eBPF for Observability
Beyond packet processing, eBPF excels at network observability. Because eBPF programs can attach to kernel functions (kprobes) and tracepoints, engineers can observe networking internals without any overhead when not active:
# Trace TCP retransmissions in real time
bpftrace -e 'kprobe:tcp_retransmit_skb {
printf("retransmit: %s:%d -> %s:%d\n",
ntop(((struct sock *)arg0)->sk_rcv_saddr),
((struct sock *)arg0)->sk_num,
ntop(((struct sock *)arg0)->sk_daddr),
ntohs(((struct sock *)arg0)->sk_dport));
}'
# Show TCP connection latency (time from SYN to ESTABLISHED)
bpftrace -e 'kprobe:tcp_v4_connect { @start[tid] = nsecs; }
kretprobe:tcp_v4_connect /@start[tid]/ {
printf("connect latency: %d us\n", (nsecs - @start[tid]) / 1000);
delete(@start[tid]);
}'
Cilium's Hubble builds on this to provide a full network observability platform. It captures every network flow in the cluster using eBPF, providing:
- Real-time service dependency maps
- DNS query logs with response codes
- HTTP request/response metrics per endpoint
- Network policy verdict logs (allowed/denied with reason)
The Bigger Picture: eBPF Beyond Networking
While this article focuses on networking, eBPF is reshaping multiple infrastructure layers. Security tools like Falco use eBPF to detect anomalous syscalls. Profilers like Pyroscope use eBPF for continuous CPU profiling. Storage systems use eBPF for I/O scheduling. The pattern is always the same: insert programmable logic into the kernel at specific hook points, without the risk and complexity of kernel modules.
For networking specifically, the trajectory is clear. iptables is legacy. New Kubernetes clusters should default to eBPF-based CNIs (Cilium or Calico eBPF mode). The performance characteristics, programmability, and observability advantages are too significant to ignore. The main caveat is kernel version requirements — teams stuck on older kernels have iptables as the only option.
Practical Adoption Path
For teams running Kubernetes and looking to move to eBPF-based networking:
- Start with observability: Deploy Hubble alongside the existing CNI to get eBPF-powered flow visibility without changing the data plane.
- Migrate CNI to Cilium: Replace the existing CNI (Calico, Flannel, AWS VPC CNI) with Cilium. It supports a kube-proxy replacement mode that eliminates iptables for service routing.
- Enable advanced features gradually: L7 policies, transparent encryption (WireGuard), bandwidth management, and Cilium Service Mesh can all be enabled incrementally.
- Remove kube-proxy: Once Cilium handles all service routing via eBPF, the kube-proxy DaemonSet can be removed entirely, eliminating the last source of iptables rules.
Key Points
- •eBPF runs inside the kernel but is safely sandboxed — the verifier guarantees programs cannot crash the kernel, access arbitrary memory, or enter infinite loops.
- •XDP processes packets at the NIC driver level, achieving 10M+ packets/sec on a single core — 5-10x faster than iptables for the same workload.
- •Cilium uses eBPF to replace kube-proxy entirely, implementing Kubernetes service load balancing without any iptables rules — critical when clusters have 10,000+ services.
- •eBPF programs are JIT-compiled to native machine code, running at near-native speed with no interpreter overhead.
- •Unlike kernel modules, eBPF programs can be loaded and updated without rebooting or recompiling the kernel, enabling live network policy changes.
Key Components
| Component | Role |
|---|---|
| eBPF Programs | Small, verified programs loaded into the kernel that execute at specific hook points without requiring kernel modules or reboots |
| XDP (eXpress Data Path) | Earliest possible hook at the NIC driver level, processing packets before the kernel network stack even sees them |
| TC (Traffic Control) Hooks | Hook point at the Linux traffic control layer for packet manipulation after the kernel stack has parsed headers |
| eBPF Maps | Shared data structures (hash tables, arrays, ring buffers) that let eBPF programs store state and communicate with userspace |
| eBPF Verifier | Kernel component that statically analyzes every eBPF program before loading to guarantee it terminates, has no invalid memory access, and cannot crash the kernel |
When to Use
Use eBPF-based networking (Cilium) for Kubernetes clusters with hundreds of services where iptables performance degrades. Use XDP for DDoS mitigation or high-performance L4 load balancing at the edge. Use eBPF tracing (bpftrace, bcc) for production network debugging without performance impact. Avoid on older kernels (< 4.15) or non-Linux platforms.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| Cilium | Open Source | Kubernetes CNI and service mesh using eBPF for networking, security, and observability without sidecars | Medium-Enterprise |
| Calico eBPF | Open Source | eBPF data plane for Calico's existing network policy engine, good migration path from iptables-based Calico | Medium-Enterprise |
| Katran | Open Source | Meta's XDP-based L4 load balancer handling millions of connections per second at the network edge | Enterprise |
| Falco | Open Source | Runtime security monitoring using eBPF to detect anomalous network connections and syscalls in containers | Medium-Enterprise |
Debug Checklist
- Check kernel version: uname -r — eBPF networking features require Linux 4.15+, full Cilium support needs 5.4+.
- Verify eBPF programs are loaded: bpftool prog list shows all active eBPF programs with their type and attach point.
- Inspect eBPF maps: bpftool map dump id <map-id> to see the data structures eBPF programs use for state (connection tracking, service endpoints).
- Check Cilium status: cilium status and cilium bpf lb list to verify service load balancing entries are correct.
- Trace packets through eBPF: cilium monitor or pwru (packet, where are you?) to see exactly which eBPF programs process each packet.
Common Mistakes
- Writing eBPF programs that exceed the verifier's complexity limit (1 million instructions). The verifier rejects overly complex programs to guarantee safety.
- Assuming eBPF works on all kernels. Linux 4.15+ is required for basic networking, 5.10+ for full features. This rules out older RHEL/CentOS 7 systems.
- Using eBPF maps without proper locking or per-CPU variants, causing contention under high concurrency that negates performance gains.
- Not accounting for the limited stack space (512 bytes) in eBPF programs. Complex packet parsing needs tail calls or helper functions, not deep recursion.
- Deploying Cilium without understanding that it replaces kube-proxy — existing iptables-based services and NetworkPolicies behave differently.
Real World Usage
- •Meta uses Katran, an XDP-based L4 load balancer, to distribute traffic across their entire fleet — handling billions of packets per second without dedicated load balancer hardware.
- •Google selected Cilium as the default CNI for GKE (Google Kubernetes Engine), replacing iptables-based kube-proxy with eBPF for all new clusters.
- •Cloudflare uses XDP programs to mitigate DDoS attacks at the edge, dropping malicious packets at the NIC level before they consume kernel resources.
- •Netflix uses eBPF-based tools like bpftrace for network performance analysis, tracing TCP retransmissions and connection latency in production.
- •Datadog uses eBPF for their network monitoring agent, capturing per-connection metrics without packet capture overhead.