BPF Maps & Ring Buffer
Mental Model
A factory floor with different types of storage. Hash maps are filing cabinets -- look up any document by name in constant time, but someone must explicitly remove old files. Arrays are numbered shelves -- slot 0 through slot N always exist, instant access by number. Per-CPU maps give every worker their own private clipboard -- no waiting in line to write. LRU maps are filing cabinets with a janitor who throws out the least-touched documents when space runs low. The ring buffer is a conveyor belt from the factory floor to the loading dock -- items placed on the belt arrive in order, and the belt keeps moving whether or not someone is picking up items at the other end.
The Problem
An eBPF-based security monitoring tool running on production hosts starts losing events when system load exceeds 50,000 syscalls per second. The tool uses perf_event_output to send events from kernel space to a userspace daemon. At low load, everything works. Under sustained pressure, the perf ring buffer overflows because each CPU has its own buffer, the userspace reader cannot poll all of them fast enough, and events are silently discarded. Switching to the BPF ring buffer -- a single shared buffer with a lock-free producer-consumer protocol -- eliminates the per-CPU fragmentation, reduces memory usage by 60%, and handles 200,000 events per second with zero drops.
Architecture
An eBPF program runs inside the kernel, observing every packet or syscall. It discovers something important -- a connection, an anomaly, a counter that needs incrementing. Now what? The program cannot call printf. It cannot write to a file. It cannot allocate heap memory. It lives in a sandbox where the verifier has stripped away almost every freedom a normal program enjoys.
BPF maps are the answer. They are the data structures that bridge the gap between eBPF programs running in kernel context and userspace applications that need the data. Every serious eBPF deployment -- Cilium, Falco, bpftrace, Katran -- depends entirely on choosing the right map type for the job.
The Map Abstraction
A BPF map is a kernel-resident data structure created via the bpf() syscall with the BPF_MAP_CREATE command. It has a type, a fixed key size, a fixed value size, and a maximum number of entries. Once created, both BPF programs (in kernel space) and userspace applications can read and write to it through well-defined interfaces.
From the BPF program side, map access uses helper functions:
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key);
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, __u64 flags);
long bpf_map_delete_elem(struct bpf_map *map, const void *key);
From userspace, the same operations go through the bpf() syscall:
bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
The verifier checks every map access at program load time. It ensures the key and value sizes match, that lookup results are null-checked before dereferencing, and that map operations happen only on maps the program is authorized to access.
Hash Maps
BPF_MAP_TYPE_HASH is the general-purpose key-value store. Keys can be any fixed-size blob -- a 4-byte IP address, a 13-byte connection 5-tuple, a 32-byte struct. The kernel implements it as a hash table with per-bucket spin locks.
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 262144);
__type(key, struct conn_key); /* 13 bytes: IPs + ports + proto */
__type(value, struct conn_value); /* 32 bytes: counters + state */
} conntrack SEC(".maps");
Lookup is O(1) average case. Insert allocates memory on the first write to a key and fails with -E2BIG when max_entries is reached. Delete frees the entry. The per-bucket spin locks mean that concurrent access to different buckets proceeds in parallel, but two CPUs hitting the same bucket serialize.
For Cilium's connection tracking, this is the workhorse. Every TCP and UDP flow gets an entry keyed by 5-tuple. NAT decisions, policy verdicts, and byte counters live in the value. With 256K max_entries, the map handles a busy Kubernetes node comfortably.
Array Maps
BPF_MAP_TYPE_ARRAY is a fixed-size array where keys are integers from 0 to max_entries - 1. All entries are pre-allocated at map creation time, so lookups never fail for valid indices (the pointer returned is always non-NULL).
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 1024);
__type(key, __u32);
__type(value, struct config_entry);
} config SEC(".maps");
Arrays are ideal for configuration data pushed from userspace (index 0 = sampling rate, index 1 = log level) and for lookup tables where the key is a small integer. They cannot have entries deleted -- once allocated, all slots exist for the map's lifetime.
Per-CPU Maps
The per-CPU variants -- BPF_MAP_TYPE_PERCPU_HASH and BPF_MAP_TYPE_PERCPU_ARRAY -- give each CPU core its own private copy of every value. This is not an optimization. For high-frequency counters, it is a requirement.
Consider a packet counter updated on every received packet. On a 64-core machine processing 10 million packets per second, a shared counter means 10 million atomic increments per second, all bouncing the same cache line across 64 cores. The cache coherency protocol (MESI/MOESI) turns this into a serialization point where most of the time is spent waiting for cache line ownership.
Per-CPU maps eliminate all of this. Each core increments its local copy with a simple store instruction. No atomics. No cache-line bouncing. No spin locks. The cost is that userspace reads back NR_CPUS copies and must sum them:
int ncpus = libbpf_num_possible_cpus();
struct vip_stats values[ncpus];
bpf_map_lookup_elem(map_fd, &key, values);
__u64 total = 0;
for (int i = 0; i < ncpus; i++)
total += values[i].packets;
Katran uses per-CPU arrays for exactly this pattern. Each VIP index holds packet and byte counts that each core updates independently. Userspace polls every second and aggregates.
LRU Hash Maps
BPF_MAP_TYPE_LRU_HASH solves the stale entry problem. A regular hash map that hits max_entries rejects all further inserts. For a connection tracking table, this means new connections are dropped once the table is full, even if most entries are for connections that closed hours ago.
LRU maps automatically evict the least recently accessed entry when space runs out. The implementation uses per-CPU LRU lists to reduce contention, with a global list as a fallback. Eviction is approximate -- under extreme churn, a recently accessed entry might get evicted if it lands on a CPU's local list that happens to be under pressure.
struct {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__uint(max_entries, 100000);
__type(key, struct rate_key);
__type(value, struct rate_value);
} rate_limiter SEC(".maps");
Size LRU maps at 2-3x the expected steady-state working set. At exactly the working set size, every burst causes eviction storms that remove entries still in active use.
The Ring Buffer
Before Linux 5.8, streaming events from BPF programs to userspace meant BPF_MAP_TYPE_PERF_EVENT_ARRAY. This creates one ring buffer per CPU. Each BPF program calls bpf_perf_event_output() to push data into the calling CPU's buffer. Userspace must open and poll all N buffers independently.
The problems with this design surface under production load:
-
Memory waste. Each per-CPU buffer must be sized for that CPU's peak event rate. On a 128-core machine, sizing each buffer at 64 KB uses 8 MB total. But if events are bursty and concentrated on a few CPUs, most of that 8 MB sits unused while the hot CPUs overflow.
-
Event loss under asymmetric load. CPU 47 handles an interrupt storm, fills its buffer, and drops events. CPU 48 through 127 sit idle with empty buffers. The total system has plenty of buffer capacity, but it is stranded on the wrong CPUs.
-
Userspace complexity. The consumer must
epoll_wait()on N file descriptors and handle events from each CPU independently, often with ordering challenges.
BPF_MAP_TYPE_RINGBUF (Linux 5.8+) replaces all of this with a single shared ring buffer:
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 256 * 1024); /* 256 KB shared across all CPUs */
} events SEC(".maps");
The ring buffer uses a lock-free reserve-commit protocol. A BPF program reserves space in the buffer with bpf_ringbuf_reserve(), writes data into the reserved region, and commits it with bpf_ringbuf_submit(). Multiple CPUs can reserve and commit concurrently without locks -- the implementation uses atomic compare-and-swap on the producer position.
Userspace consumes events through a single file descriptor:
struct ring_buffer *rb = ring_buffer__new(map_fd, callback, NULL, NULL);
while (1)
ring_buffer__poll(rb, 100); /* 100ms timeout */
Events arrive in commit order (not necessarily CPU order), and the single buffer means total capacity is shared across all CPUs. A 256 KB ring buffer handles sustained throughput that would require 8 MB of per-CPU perf buffers on the same machine.
Map Pinning and Lifetime
By default, a BPF map exists as long as at least one BPF program or file descriptor references it. When the last reference is dropped, the map and all its data are destroyed.
For maps that hold persistent state -- connection tracking, counters, configuration -- this is unacceptable. Restarting the Cilium agent would wipe all connection tracking entries, causing thousands of active connections to reset.
Map pinning solves this. Pinning a map to bpffs creates a filesystem entry at /sys/fs/bpf/<name> that holds a reference to the map:
# Pin a map
bpftool map pin id 42 /sys/fs/bpf/conntrack_map
# Later, retrieve the pinned map
bpftool map show pinned /sys/fs/bpf/conntrack_map
From code:
// Pin
bpf_obj_pin(map_fd, "/sys/fs/bpf/conntrack_map");
// Retrieve
int fd = bpf_obj_get("/sys/fs/bpf/conntrack_map");
The pinned map survives program unload and reload. A new version of the BPF program can attach to the existing map and pick up right where the old version left off. Cilium relies on this for zero-downtime datapath upgrades -- the agent restarts, reloads programs, and reconnects to pinned maps without losing a single connection tracking entry.
Inspecting Maps at Runtime
bpftool is the primary tool for inspecting BPF maps on a running system:
# List all maps
$ bpftool map list
42: hash name conntrack flags 0x0
key 16B value 32B max_entries 262144 memlock 12582912B
pinned /sys/fs/bpf/conntrack_map
57: percpu_array name vip_counters flags 0x0
key 4B value 16B max_entries 4096 memlock 33554432B
63: ringbuf name events flags 0x0
max_entries 262144 memlock 266240B
# Dump a hash map
$ bpftool map dump id 42
key: 0a 00 01 05 0a 00 02 0a 1f 90 c3 50 06 00 00 00
value: 00 00 00 00 00 00 03 e8 00 00 00 00 00 18 6a 00 ...
# Dump per-CPU values (shows each CPU's copy)
$ bpftool map lookup percpu id 57 key 0 0 0 0
key: 00 00 00 00
value (CPU 00): 00 00 00 00 00 00 41 a3 00 00 00 00 00 3c f2 80
value (CPU 01): 00 00 00 00 00 00 3e 22 00 00 00 00 00 38 b1 40
value (CPU 02): 00 00 00 00 00 00 42 f1 00 00 00 00 00 3f 01 c0
...
For debugging event loss, check the ring buffer's consumer position against the producer position. If the consumer is falling behind, events will be dropped when the buffer wraps around. The fix is either a larger buffer or a faster consumer (typically by moving slow I/O out of the consumption callback).
Common Questions
When should a hash map be used instead of an array map?
Use hash maps when entries are created and destroyed dynamically (connection tracking, process monitoring) or when keys are not small integers (IP addresses, 5-tuples, strings). Use array maps when the key space is a contiguous range of integers and all entries should exist for the map's entire lifetime (configuration tables, histograms with fixed bucket counts, program state).
How much memory does a per-CPU map actually use?
NR_CPUS * max_entries * value_size for arrays, plus overhead. On a 128-core machine, a per-CPU array with 10,000 entries of 16 bytes each uses 128 * 10,000 * 16 = ~20 MB. This is often acceptable given the contention it eliminates, but it can surprise operators who see memlock values much larger than expected in bpftool map list.
Can BPF maps be shared between multiple BPF programs?
Yes. Multiple BPF programs can reference the same map if they are loaded with the same map file descriptor or if they attach to a pinned map. This is how Cilium coordinates between its XDP, tc, and cgroup BPF programs -- they all share the same connection tracking and policy maps.
What happens when a ring buffer is full and a BPF program tries to write?
bpf_ringbuf_reserve() returns NULL. The BPF program must check for this and handle it -- typically by incrementing a drop counter in a separate map and returning. Events are lost. The fix is a larger ring buffer or a faster consumer, not a retry loop (BPF programs cannot block).
How Technologies Use This
A Docker Swarm cluster running 3,000 containers across 50 nodes enforces network policies that specify which containers can communicate with each other. Using iptables, each node maintains 12,000 rules that the kernel evaluates linearly for every incoming and outgoing packet. At 200,000 packets per second per node, iptables rule evaluation consumes 35% of CPU time, and adding 500 more containers requires 2,000 additional rules that further degrade packet processing throughput.
Cilium, deployed as the container networking plugin, replaces iptables-based policy enforcement with BPF hash maps pinned to the bpffs virtual filesystem. Each network policy rule becomes a key-value entry in a BPF hash map, where the key is a combination of source and destination security identity (derived from container labels) and the value stores the allow/deny verdict. When a packet arrives at a container's veth interface, a BPF program attached to the TC (traffic control) hook performs a single O(1) hash map lookup to determine whether the flow is permitted. The lookup cost is constant regardless of whether the cluster has 100 or 10,000 policy rules.
On a node handling 500,000 concurrent connections across 60 containers, the Cilium BPF hash maps consume approximately 80 MB of pinned kernel memory. Each connection's NAT state, policy verdict, and byte counters are stored in a per-CPU hash map variant, which eliminates lock contention because each CPU core writes exclusively to its own copy of the map. The CPU overhead for policy enforcement drops from 35% under iptables to under 5% with BPF maps, freeing 30% of node CPU capacity for application workloads.
A Kubernetes cluster with 4,000 ClusterIP services runs kube-proxy in iptables mode on each node. Every service creates approximately 8 iptables rules (pre-routing, output, service chain, endpoint chains), resulting in 32,000 rules per node. When a pod sends a packet to a ClusterIP, the kernel walks these rules sequentially in the nat table to find the matching service and select a backend endpoint. At scale, this linear search adds 1.5 ms of latency per connection setup and consumes 20% of node CPU.
Replacing kube-proxy with Cilium's BPF-based service implementation stores the entire service-to-endpoint mapping in a BPF hash map. The map key is the service ClusterIP and port; the value contains the list of backend pod IPs and the load-balancing state. A BPF program attached at the socket layer (sock_ops and connect hooks) intercepts outgoing connections and performs a single hash map lookup to resolve the service to a backend. For established connections, subsequent packets bypass the lookup entirely because the BPF program rewrites the destination at connect time rather than on every packet.
On a node with 4,000 services and 12,000 backend endpoints, the BPF service map occupies roughly 15 MB of kernel memory. Connection setup latency drops from 1.5 ms (iptables linear walk) to under 10 microseconds (single hash lookup). The CPU savings are proportional to service count: at 4,000 services, the BPF approach uses 95% less CPU for service resolution than iptables mode, and scaling to 10,000 services adds negligible overhead because hash map lookup time remains constant.
An Nginx reverse proxy handles 2 million HTTP requests per second across 800 upstream servers. During traffic spikes, certain clients send 50,000 requests per second, overwhelming individual upstream servers. Traditional rate limiting using Nginx's limit_req module operates at Layer 7 after the kernel has already performed TCP handshake processing, socket allocation, and HTTP parsing for every request, wasting CPU cycles on traffic that will ultimately be rejected.
XDP (eXpress Data Path) programs attached to the NIC driver hook process packets before they enter the kernel networking stack. A BPF program at the XDP layer maintains a per-CPU hash map keyed by source IP address, where each value stores a token bucket counter with a last-updated timestamp. When a packet arrives, the XDP program performs a hash map lookup, decrements the token count, and either passes the packet up the stack (XDP_PASS) or drops it immediately (XDP_DROP). Dropped packets never allocate an sk_buff, never enter the TCP state machine, and never consume Nginx worker CPU time. The per-CPU map variant ensures that each core updates its own copy of the rate counter without atomic operations or cache-line contention.
At 2 million requests per second on a 32-core server, the XDP rate limiter processes each packet in approximately 50 nanoseconds, compared to 5 microseconds for the same decision made at Layer 7 inside Nginx. During an attack generating 10 million packets per second from 5,000 source IPs, the XDP program drops 8 million packets per second at line rate while allowing legitimate traffic through. The per-CPU hash map holding 5,000 entries across 32 cores uses 32 * 5,000 * 64 bytes, approximately 10 MB of memory, a negligible cost relative to the CPU savings from not processing millions of packets through the full kernel networking stack.
Same Concept Across Tech
| Technology | How it uses BPF maps | Key gotcha |
|---|---|---|
| Cilium | Pinned hash maps for conntrack, per-CPU arrays for packet counters, LRU maps for NAT entries | Unpinned maps lose all connection state on agent restart. Always pin to /sys/fs/bpf/ |
| Falco | Ring buffer (previously perf_event_array) for streaming security events to userspace | perf_event_array drops events under asymmetric CPU load. Ring buffer eliminates this |
| bpftrace | Per-CPU hash maps for @aggregations, arrays for histograms, printf via perf_event_output | Large aggregation maps can hit max_entries limit. Increase with -DBPF_MAP_SIZE |
| BCC tools | Hash maps for per-process/per-file statistics, arrays for configuration, perf buffers for events | Python bindings add overhead to map reads. For high-frequency polling, use libbpf directly |
| Katran | Per-CPU arrays for VIP packet/byte counters, hash maps for consistent hashing state | Per-CPU arrays on 128-core machines use 128x memory. Size accordingly |
Stack layer mapping (BPF map debugging):
| Layer | What to check | Tool |
|---|---|---|
| Application | Which map type is being used, and is it the right choice for the access pattern? | bpftool map list, source code review |
| BPF program | Are map operations succeeding, or returning -ENOENT / -E2BIG? | bpf_trace_printk() on error paths |
| Map subsystem | Is the map full? Is LRU evicting too aggressively? | bpftool map show id (check max_entries vs current entries) |
| Ring buffer | Is the consumer keeping up, or is the buffer filling? | bpftool map show, check consumer lag |
| Kernel | Is bpffs mounted? Are maps pinned correctly? | mount |
| Hardware | Is per-CPU memory pressure causing allocation failures? | dmesg for BPF allocation errors |
Design Rationale eBPF programs run in kernel context where they cannot call arbitrary kernel functions or allocate memory freely. Maps provide the controlled, verifier-checked interface for data storage and communication. The variety of map types exists because no single data structure serves all access patterns. Hash maps handle dynamic key-value lookups. Arrays handle fixed-index access with zero allocation overhead. Per-CPU variants eliminate synchronization for high-frequency writes. The ring buffer solves the event streaming problem that perf_event_array handled poorly -- one shared buffer instead of N independent per-CPU buffers, with a lock-free protocol that lets multiple CPUs write concurrently without coordination.
If You See This, Think This
| Symptom | Likely cause | First check |
|---|---|---|
| Events dropped under high load | perf_event_output per-CPU buffer overflow on hot CPUs | Switch to BPF_MAP_TYPE_RINGBUF, or increase per-CPU buffer size |
| CPU usage spikes on map-heavy BPF program | Shared (non-per-CPU) map with high write contention | Switch to per-CPU map variant, check bpftool map show for type |
| Map insert returns -E2BIG | Map reached max_entries limit | Increase max_entries or switch to LRU map for auto-eviction |
| Connection state lost after program restart | Maps not pinned to bpffs | Pin maps with bpf(BPF_OBJ_PIN) to /sys/fs/bpf/, verify with ls |
| LRU map evicts entries that are still active | max_entries too close to working set size, eviction too aggressive | Size LRU map at 2-3x expected working set |
| Userspace reads stale counter values | Reading per-CPU map but not summing all CPU copies | Use bpftool map lookup percpu, sum all CPU values in application |
| Ring buffer consumer falls behind | Synchronous I/O in consumer callback blocks event processing | Consume into in-memory queue, drain asynchronously |
When to Use / Avoid
Relevant when:
- Building or debugging eBPF programs that communicate state between kernel and userspace
- Diagnosing event loss in eBPF-based monitoring tools (Falco, Cilium Hubble, custom probes)
- Choosing between perf_event_output and BPF ring buffer for event streaming
- Optimizing high-frequency counters in XDP or tc programs (per-CPU vs shared maps)
- Understanding why a BPF map runs out of space or why entries disappear (LRU eviction)
Watch out for:
- perf_event_output drops events under asymmetric CPU load because per-CPU buffers overflow independently
- Shared (non-per-CPU) maps under high write rates cause spin lock contention that dominates CPU usage
- Unpinned maps are destroyed on program unload, losing all accumulated state
- LRU eviction is approximate and can evict hot entries under sustained pressure near max_entries
Try It Yourself
1 # List all BPF maps currently loaded in the kernel
2
3 bpftool map list
4
5 # Show detailed info for a specific map
6
7 bpftool map show id 42
8
9 # Dump all entries in a hash map
10
11 bpftool map dump id 42
12
13 # Look up a specific key in a map (key in hex bytes)
14
15 bpftool map lookup id 42 key 0x0a 0x00 0x01 0x01
16
17 # Look up a per-CPU map entry (shows value on each CPU)
18
19 bpftool map lookup percpu id 42 key 0x00 0x00 0x00 0x01
20
21 # Pin a map to bpffs so it survives program restart
22
23 bpftool map pin id 42 /sys/fs/bpf/my_conntrack_map
24
25 # List pinned objects on bpffs
26
27 ls -la /sys/fs/bpf/
28
29 # Create a hash map from the command line (for testing)
30
31 bpftool map create /sys/fs/bpf/test_map type hash key 4 value 8 entries 1024 name test_map
32
33 # Delete a specific entry from a map
34
35 bpftool map delete id 42 key 0x0a 0x00 0x01 0x01
36
37 # Show all BPF programs and their map references
38
39 bpftool prog show
40
41 # Check if bpffs is mounted
42
43 mount | grep bpf
44
45 # Monitor BPF-related kernel messages
46
47 dmesg | grep -i bpfDebug Checklist
- 1
List all BPF maps and check types and sizes: bpftool map list - 2
Dump map contents to verify entries: bpftool map dump id <map_id> - 3
Check for pinned maps: ls -la /sys/fs/bpf/ - 4
Monitor ring buffer usage: bpftool map show id <ring_buf_id> - 5
Check for dropped events in perf buffers: perf stat -e bpf:bpf_perf_event_output - 6
Verify per-CPU map values: bpftool map lookup percpu id <map_id> key <hex_key>
Key Takeaways
- ✓BPF ring buffer (BPF_MAP_TYPE_RINGBUF) is strictly superior to perf_event_array for event streaming. It uses a single shared buffer instead of per-CPU buffers, which means better memory efficiency (one buffer sized to aggregate throughput, not N buffers each sized for peak per-CPU throughput) and simpler userspace consumption (one fd to poll instead of N).
- ✓Per-CPU maps are not optional for high-frequency counters. A shared hash map with 10 million updates per second across 64 cores spends more time on spin lock contention than on actual work. Per-CPU variants eliminate all synchronization from the write path. The cost is NR_CPUS copies of each value in memory and a userspace aggregation step on read.
- ✓LRU hash maps solve the stale entry problem that plagues long-running BPF programs. A connection tracking map without eviction grows until it hits max_entries and then fails all inserts. LRU maps evict cold entries automatically, but the eviction is approximate -- under heavy churn, hot entries can be evicted if the LRU lists are not perfectly maintained. Size the map at 2-3x expected steady-state entries.
- ✓Map pinning to bpffs (/sys/fs/bpf/) decouples map lifetime from program lifetime. A pinned map survives program restart, allowing a new version of a BPF program to attach to existing state without losing connection tracking entries or counters. Cilium relies on this for seamless datapath upgrades.
- ✓The bpf() syscall is the single entry point for all map operations from userspace: BPF_MAP_CREATE, BPF_MAP_LOOKUP_ELEM, BPF_MAP_UPDATE_ELEM, BPF_MAP_DELETE_ELEM, BPF_MAP_GET_NEXT_KEY. From BPF program context, maps are accessed via helper functions like bpf_map_lookup_elem() that the verifier validates at load time.
Common Pitfalls
- ✗Using perf_event_output when BPF ring buffer is available. perf event arrays allocate one buffer per CPU, each sized for worst-case throughput. On a 128-core machine with 64 KB per-CPU buffers, that is 8 MB of ring buffer memory fragmented across 128 independent buffers. The BPF ring buffer achieves the same throughput with a single 256 KB buffer and never drops events under asymmetric load where some CPUs are hot and others are idle.
- ✗Forgetting to use per-CPU maps for frequently updated counters. A regular BPF_MAP_TYPE_HASH protects each bucket with a spin lock. At 1 million updates per second on a 64-core machine, lock contention dominates. The fix is BPF_MAP_TYPE_PERCPU_HASH, which eliminates all locking. The tradeoff: reads require summing NR_CPUS values in userspace.
- ✗Setting max_entries too low on LRU hash maps. When the map is full and churn is high, the LRU eviction runs on the hot path of every insert. If max_entries matches the expected steady state exactly, brief traffic spikes cause eviction storms that remove entries still in active use. Size LRU maps at 2-3x the expected working set.
- ✗Not pinning maps that should survive program restarts. Without pinning, a BPF map is destroyed when the last program referencing it is unloaded. Restarting a Cilium agent without pinned maps drops all connection tracking state, causing thousands of connections to reset. Always pin maps that hold persistent state to /sys/fs/bpf/.
- ✗Blocking in the userspace ring buffer consumer. The BPF ring buffer delivers events in order with a callback or epoll interface. If the consumer blocks on slow I/O (writing events to disk synchronously, making network calls), the ring buffer fills and events are lost. Consume into an in-memory queue first, then drain the queue asynchronously.
Reference
In One Line
BPF maps are the shared memory between eBPF programs and userspace -- pick the wrong type and a monitoring tool either drops events, burns CPU on lock contention, or silently evicts the entries it needs most.