Grafana Beyla
eBPF-based auto-instrumentation for zero-code observability
Use Cases
Architecture
Why It Exists
The biggest barrier to observability adoption is instrumentation. Teams deploy services with zero monitoring because adding OTel SDK instrumentation means modifying code, updating dependencies, testing, and deploying, across every service in every language. For a platform with 500+ services in Go, Java, Python, and Node.js, that's months of work before a single dashboard lights up.
Grafana Beyla, an open-source project built by Grafana Labs (the company behind Grafana, Loki, Tempo, and Mimir), solves the cold-start problem. It attaches eBPF programs to the Linux kernel and automatically captures HTTP, gRPC, and SQL traffic from any process on the node. No code changes, no container image modifications, no service restarts. Deploy Beyla as a DaemonSet and every service on the cluster immediately produces RED metrics and basic trace spans.
This is not a new idea. Service meshes (Istio, Linkerd) have provided network-level telemetry for years. But service meshes add a sidecar proxy to every pod, increasing latency, memory, and operational complexity. Beyla achieves similar network-level visibility at a fraction of the cost by operating at the kernel level instead of the network proxy level.
How It Works
eBPF Probes: Beyla attaches two types of eBPF programs to the kernel. Uprobes hook into user-space function entry and exit points (for example, Go's net/http.(*conn).serve or Java's javax.servlet.http.HttpServlet.service). These capture application-level protocol details like HTTP method, URL path, and status code. Kprobes hook into kernel-space functions like tcp_sendmsg and tcp_recvmsg to capture network-level events. The combination of uprobes and kprobes gives Beyla visibility into both the application protocol and the network transport.
Map Ring Buffers: When an eBPF probe fires, it writes event data (timestamp, connection metadata, protocol fields) into a per-CPU eBPF ring buffer map. The user-space Beyla agent reads events from these maps, correlates request/response pairs, and computes duration. Ring buffers are lock-free and have fixed memory allocation, which is why Beyla's memory consumption is predictable regardless of traffic volume.
Metric Generation: From correlated request/response pairs, Beyla generates three metric families: http_server_request_duration_seconds (histogram), http_server_request_total (counter), and http_server_request_error_total (counter by status code class). For gRPC, equivalent metrics are generated with method and status labels. These metrics follow OpenTelemetry semantic conventions, so they're compatible with standard Grafana dashboards and Prometheus alert rules.
Trace Span Creation: Beyla reads incoming traceparent headers from HTTP requests at the eBPF level. If a traceparent exists, Beyla creates a child span with the correct parent span ID. If no traceparent exists, Beyla creates a new root span. Spans include service name, HTTP method, URL path, status code, and duration. These spans are exported via OTLP alongside the SDK-generated spans, creating a unified trace that includes both eBPF-detected and SDK-instrumented service calls.
End-to-End Example: A Single REST Call
A concrete walkthrough of what happens when a POST /checkout request hits a Go service that has no OTel SDK -- only Beyla running on the node.
The setup: Service A (API gateway, OTel SDK instrumented) calls Service B (checkout, Beyla only), which calls Service C (payment, OTel SDK instrumented). All three run on Kubernetes. Beyla runs as a DaemonSet on every node. The OTel Collector runs as a separate DaemonSet on every node.
Step 1: The incoming HTTP request
Service A's OTel SDK creates a span and injects a traceparent header into the outgoing request to Service B:
POST /checkout HTTP/1.1
Host: checkout-service:8080
Content-Type: application/json
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
│ │ │ │
│ │ trace_id (32 hex) │ parent_span_id│ sampled
version (16 hex, Service A's span)
{"cart_id": "cart_9x8z", "items": 3, "total": 149.99}
Step 2: Beyla captures the request at the kernel level
The moment this TCP packet arrives on the node, two things fire:
- kprobe on
tcp_recvmsg-- captures the timestamp and raw socket buffer bytes. Beyla now has the arrival time. - uprobe on Go's
net/http.(*conn).serve-- fires when the Go HTTP server starts handling the request. Beyla reads the socket buffer and extracts:
Beyla eBPF ring buffer event (request):
timestamp: 1709827200.123456
pid: 48291 (checkout-service process)
method: POST
path: /checkout
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
source_ip: 10.0.3.42:52918 (Service A pod)
dest_ip: 10.0.7.15:8080 (this pod)
This event is written to the per-CPU ring buffer. The user-space Beyla agent reads it and holds it in memory, waiting for the matching response.
Step 3: The checkout service calls the payment service
During request handling, Service B makes an outgoing HTTP call to Service C. Beyla intercepts this at the kprobe on tcp_sendmsg. It reads the outgoing socket buffer and injects a new traceparent header:
POST /process-payment HTTP/1.1
Host: payment-service:8080
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-b7ad6b7169203331-01
│ same trace_id (preserved) │ NEW span_id
(Beyla generated this)
The trace_id stays the same (4bf92f...). The parent_span_id is now Beyla's own span_id (b7ad6b...), not Service A's. Service C's OTel SDK will read this header and create a child span under Beyla's span.
Step 4: Response captured
Service B returns 200 OK. The uprobe fires at handler exit, and the kprobe on tcp_sendmsg captures the response bytes:
Beyla eBPF ring buffer event (response):
timestamp: 1709827200.170891
pid: 48291
status_code: 200
content_len: 47
Beyla correlates the request and response events by PID and connection. Duration = 170891 - 123456 = 47.435ms.
Step 5: Beyla generates metrics
From this single request/response pair, Beyla produces these Prometheus-format metrics:
# Histogram: request duration (distributed across buckets)
http_server_request_duration_seconds_bucket{
service_name="checkout-service",
http_method="POST",
http_target="/checkout",
http_status_code="200",
le="0.05"
} 1 ← 47ms falls in the 50ms bucket
http_server_request_duration_seconds_sum{...} 0.047435
http_server_request_duration_seconds_count{...} 1
# Counter: total requests
http_server_request_total{
service_name="checkout-service",
http_method="POST",
http_target="/checkout",
http_status_code="200"
} 1
Notice the labels are fixed: service_name, http_method, http_target, http_status_code. Beyla cannot add custom labels like cart_id or customer_tier -- it only sees the HTTP wire protocol, not application internals.
Step 6: Beyla generates a trace span
From the same request/response pair, Beyla produces this OTLP span:
{
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spanId": "b7ad6b7169203331",
"parentSpanId": "00f067aa0ba902b7",
"name": "POST /checkout",
"kind": "SPAN_KIND_SERVER",
"startTimeUnixNano": 1709827200123456000,
"endTimeUnixNano": 1709827200170891000,
"attributes": {
"service.name": "checkout-service",
"http.method": "POST",
"http.target": "/checkout",
"http.status_code": 200,
"net.peer.ip": "10.0.3.42",
"net.host.port": 8080
},
"status": { "code": "STATUS_CODE_UNSET" }
}
Key fields: traceId was extracted from the incoming traceparent header. parentSpanId points to Service A's span. spanId is freshly generated by Beyla -- this is the same ID that was injected into the outgoing traceparent header to Service C.
Step 7: Data flows to the OTel Collector
Beyla batches the metrics and span and sends them via OTLP gRPC to localhost:4317 -- the OTel Collector DaemonSet running on the same node. This is a local loopback call, no network hop.
Service B (checkout)
│ │
│ syscalls │ OTLP gRPC (if OTel SDK is added)
│ (tcp_recvmsg, │
│ tcp_sendmsg) │
▼ │
Beyla Agent (DaemonSet) │
│ │
│ OTLP gRPC │
└──────────┐ ┌──────────┘
▼ ▼
OTel Collector (DaemonSet, same node) ← localhost:4317
│ Processors: memory_limiter → dedup → filter → enrich → batch
│
├──── metrics ────▶ Kafka (metrics-ingestion) ──▶ VictoriaMetrics ──▶ Grafana
│
└──── traces ─────▶ Kafka (traces-raw) ─────────▶ Grafana Tempo ────▶ Grafana
Both Beyla and the OTel SDK (if present) push to the same OTel Collector on localhost:4317. The dedup processor in the Collector handles the overlap -- see Step 8 below.
The OTel Collector does not know or care that the metrics and span came from Beyla rather than an OTel SDK. The data format is identical (OTLP protobuf). It processes Beyla's telemetry through the exact same pipeline: memory_limiter → filter (drop health checks) → enrich (add k8s metadata) → batch → export to Kafka.
The complete trace in Grafana
Opening this trace in Grafana Tempo, the waterfall shows three spans:
trace_id: 4bf92f3577b34da6a3ce929d0e0e4736
├── api-gateway (Service A) 0ms ─────────────── 85ms [OTel SDK span]
│ span_id: 00f067aa0ba902b7
│ attributes: user_id=u_123, cart_size=3, feature_flag=new_checkout
│
├── POST /checkout (Service B) 5ms ────────── 52ms [Beyla eBPF span]
│ span_id: b7ad6b7169203331
│ attributes: http.method=POST, http.status_code=200
│ (no custom attributes -- eBPF can't see application internals)
│
└── POST /process-payment (Service C) 20ms ── 48ms [OTel SDK span]
span_id: c8f4e2a1d3b56789
attributes: payment_method=credit_card, amount=149.99, provider=stripe
Service A and C have rich custom attributes (user_id, cart_size, payment_method) because they use the OTel SDK. Service B's span has only HTTP-level attributes because Beyla can only see what's on the wire. But the trace is complete -- all three services are connected with correct parent-child relationships, even though Service B has zero instrumentation code.
Step 8: What happens when both Beyla AND OTel SDK run on Service B
Now suppose the checkout team adds OTel SDK instrumentation to Service B. Beyla is still running on the node (it's a DaemonSet, it doesn't know or care about individual services). For the same POST /checkout request, the OTel Collector now receives two spans from two different sources:
Span from Beyla (eBPF): Span from OTel SDK (application):
┌────────────────────────────────┐ ┌────────────────────────────────────┐
│ name: POST /checkout │ │ name: POST /checkout │
│ service: checkout-service │ │ service: checkout-service │
│ http.method: POST │ │ http.method: POST │
│ http.target: /checkout │ │ http.target: /checkout │
│ http.status_code: 200 │ │ http.status_code: 200 │
│ duration: 47.4ms │ │ duration: 47.2ms │
│ │ │ cart_id: cart_9x8z │
│ (6 attributes, HTTP-level only)│ │ user_id: u_123 │
│ │ │ items_count: 3 │
│ │ │ payment_method: credit_card │
│ │ │ (10+ attributes, business context) │
└────────────────────────────────┘ └────────────────────────────────────┘
Without deduplication, the trace waterfall would show two identical spans for Service B. The Collector's dedup processor prevents this.
How the dedup processor resolves it: It groups incoming spans by {service_name, http.method, http.target} within a 5-second window. When two spans match these fields and overlap in time, it keeps the SDK version and drops the Beyla version. The SDK span wins because it carries richer attributes (cart_id, user_id, items_count) that Beyla's kernel-level view cannot produce.
The same logic applies to metrics. Both sources produce http_server_request_duration_seconds for the same endpoint. The Collector keeps the SDK histogram (which has custom bucket boundaries and additional labels) and drops Beyla's version.
Collector config for dedup:
processors:
dedup:
# Group spans by these fields to detect duplicates
match_keys: ["service.name", "http.method", "http.target"]
# Time window to look for overlapping spans
window: 5s
# When duplicates found, keep the span with more attributes
prefer: "richer"
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, dedup, filter, batch] # dedup early in the chain
exporters: [otlp/tempo]
This is not custom code -- it's a standard OTel Collector processor configured in YAML, just like the filter or batch processors.
The partial coverage case is the real win. If the checkout team instruments their three most critical endpoints with the OTel SDK but leaves twenty other endpoints untouched, the Collector deduplicates only where both sources overlap. The three critical endpoints get rich SDK spans. The twenty remaining endpoints keep Beyla's eBPF spans. The result is the best of both worlds without having to instrument everything at once.
Architecture Deep Dive
Deployment Model: Beyla runs as a Kubernetes DaemonSet with hostPID: true to access process information and privileged: true (or specific eBPF-related capabilities) to load eBPF programs. Each Beyla pod discovers services on its node via process scanning, attaches uprobes to detected HTTP/gRPC handlers, and begins capturing traffic. Discovery is automatic, with no per-service configuration needed.
Language Support: Beyla's uprobe targets are language-specific. For Go, it hooks into net/http, gorilla/mux, gin, and grpc-go. For Java, it hooks into servlet containers (Tomcat, Jetty) and gRPC-java. For Python, it hooks into WSGI/ASGI handlers. For Node.js, it hooks into the HTTP module. For languages without specific uprobe targets, Beyla falls back to kprobe-only mode, which still captures TCP-level request/response pairs but with less protocol detail.
The OTel Collector Integration: Beyla exports metrics and traces via OTLP to the node-local OTel Collector (the same DaemonSet that receives SDK telemetry). This is critical: it means eBPF-generated and SDK-generated telemetry flow through the same pipeline, with the same value-based routing, same metadata enrichment, same deduplication. From the storage layer's perspective, there is no difference between a metric produced by Beyla and one produced by the OTel SDK.
Trace Stitching: How eBPF Spans Join Distributed Traces
The hardest thing to understand about Beyla is how a kernel-level agent produces spans that fit cleanly into an application-level distributed trace. The answer comes down to one HTTP header: traceparent.
Reading traceparent at the kernel level. When a request hits a service, Beyla's uprobe fires at the HTTP handler entry point. While still in kernel space, Beyla reads the request headers from the socket buffer and looks for traceparent. The W3C Trace Context format is 00-{trace_id}-{parent_span_id}-{trace_flags}. If Beyla finds this header, it extracts the trace_id and parent_span_id. It then generates a new span_id for itself and builds a span where parent_span_id points to the incoming value. The trace_id stays the same, so this span belongs to the same distributed trace.
If there is no traceparent header (the request came from outside the instrumented system), Beyla generates both a fresh trace_id and span_id, creating a root span.
Outgoing context propagation. When the service makes downstream HTTP calls, Beyla intercepts the outgoing request at the tcp_sendmsg kprobe and injects a new traceparent header containing the span_id it created. The downstream service (whether instrumented by Beyla or by OTel SDK) receives this header and continues the trace chain. This is how a request flowing through Service A (SDK instrumented) to Service B (Beyla only) to Service C (SDK instrumented) produces a complete trace. Service B's eBPF-generated span sits between A and C in the trace waterfall with correct parent-child relationships. In Grafana, the trace looks the same regardless of whether a span came from eBPF or SDK.
What eBPF cannot stitch. There are real limits to what kernel-level tracing can do:
- Internal function calls. Beyla sees the HTTP handler entry and exit, but nothing in between. If
handleCheckout()callsvalidateCart()which callsregexp.Compile()10,000 times, Beyla reports one span for the entire HTTP request. It has no visibility into the internal call tree. That's what OTel SDK spans and continuous profiling cover. - Asynchronous processing. If a service receives an HTTP request, queues work to a background goroutine, and responds immediately, Beyla captures only the HTTP round-trip (fast). The background work is invisible because it doesn't produce new network calls that Beyla can trace back to the original request.
- Non-HTTP transports. Kafka messages, SQS events, custom binary RPC, and other non-HTTP transports don't carry
traceparentheaders in a way Beyla can read. These require manual context injection via the OTel SDK. Beyla only stitches traces for HTTP/1.1, HTTP/2, and gRPC.
Deduplication: When Both eBPF and SDK Instrument the Same Service
When a team adds OTel SDK instrumentation to a service that Beyla already covers, the OTel Collector receives two signals for every HTTP request: one eBPF-generated span from Beyla and one SDK-generated span from the application. Without deduplication, every trace would show duplicate spans.
How matching works. The OTel Collector's dedup processor groups incoming spans by {service_name, http.method, http.target} within a short time window (typically 5 seconds). When it finds two spans from different sources that match these fields and overlap in time, it keeps the SDK version and drops the eBPF version. The SDK span wins because it carries richer attributes (custom labels like user_id, cart_size, span events, and detailed status information) that Beyla's kernel-level view can't produce.
Metrics dedup works the same way. If both Beyla and the SDK produce http_server_request_duration_seconds for the same service and endpoint, the collector drops Beyla's version. The SDK histogram has custom buckets and additional labels that Beyla can't attach.
Partial SDK coverage. Here's the useful edge case: if a team adds SDK instrumentation to their three most critical endpoints but leaves the other twenty untouched, Beyla's spans survive for those twenty endpoints. The collector only deduplicates where both sources report the same {service, method, target} tuple. Endpoints with SDK coverage get the richer SDK spans. Endpoints without SDK coverage keep Beyla's eBPF spans. The result is the best of both worlds without having to instrument everything at once.
Two-Tier Instrumentation Model
The real value of Beyla is not as a standalone tool but as the baseline tier in a two-tier instrumentation strategy:
Tier 1, eBPF Baseline (Beyla): Every service gets RED metrics and basic trace spans automatically. Zero effort from application teams. This tier answers: "Is the service healthy? What's the error rate? What's the latency distribution?" It covers 80% of incident detection needs.
Tier 2, OTel SDK (Deep Instrumentation): Teams that need more add OTel SDK instrumentation incrementally. Custom business metrics (orders_placed, payment_amount), detailed span attributes (user_id, cart_size, feature_flag), baggage propagation, and continuous profiling. This tier answers: "Why is this specific request slow? What business impact does this error have?"
The two tiers are additive, not exclusive. A service instrumented with both Beyla and OTel SDK produces eBPF-captured RED metrics plus SDK-captured custom metrics. The OTel Collector deduplicates overlapping signals (same HTTP request captured by both eBPF and SDK) using the dedup processor.
This model eliminates the all-or-nothing adoption problem. Day one: deploy Beyla, every service has baseline observability. Week one: critical services add OTel SDK for business metrics. Month one: the long tail of services still runs on Beyla-only, and that's fine. They have RED metrics, they show up in traces, they trigger alerts on error rate spikes.
Best Practices
Always deploy Beyla alongside the OTel Collector DaemonSet, not instead of it. Beyla handles auto-instrumentation. The OTel Collector handles pipeline processing (batching, routing, enrichment, export). Trying to make Beyla do both leads to missed features.
Set explicit resource limits on the Beyla DaemonSet: 256 MB memory limit, 100m CPU limit. Beyla is lightweight, but unbounded pods are a reliability risk on shared nodes.
Test kernel compatibility in staging before rolling out. Run bpftool btf dump file /sys/kernel/btf/vmlinux | head to verify BTF support. If BTF is missing, Beyla will fail to load its eBPF programs.
Monitor Beyla's own metrics (beyla_ebpf_tracer_flushes_total, beyla_internal_errors_total) in the meta-monitoring stack. If Beyla silently stops producing data for a node, baseline visibility is lost for every service on that node.
Plan for SDK adoption from day one. Beyla is the bootstrap, not the destination. Build dashboards and alerts on Beyla's RED metrics immediately, but track which services have added SDK instrumentation and which still run on eBPF-only. The goal is Tier 2 coverage for all critical services within the first quarter.
Pros
- • Zero code changes required. Attach to a running process and get RED metrics and basic trace spans immediately
- • Kernel-level visibility via eBPF uprobes and kprobes. Sees HTTP/gRPC/SQL calls that application-level instrumentation might miss
- • Low overhead: ~200 MB RAM per node. Runs as a DaemonSet alongside OTel Collectors without competing for resources
- • Complements OTel SDK instrumentation. Beyla provides the baseline, SDK adds depth. The two-tier model means every service has observability even before teams write instrumentation code
- • Native OTLP export. Feeds directly into the OTel Collector pipeline so metrics and traces flow through the same processing, routing, and storage path as SDK-generated telemetry
Cons
- • Linux-only. Requires kernel 5.8+ with BTF (BPF Type Format) support. No Windows, no macOS, no older kernels
- • Limited to network-level signals. Cannot capture custom business metrics, application-specific span attributes, or baggage propagation. For those you still need the OTel SDK
- • eBPF verifier constraints limit program complexity. Some edge cases in protocol parsing (non-standard HTTP framing, custom binary protocols) may not be detected
- • No custom metric dimensions. The labels Beyla produces are fixed (service name, HTTP method, status code, URL path). You cannot add business context like customer_id or feature_flag
- • Kernel upgrades can break eBPF programs. BTF relocations handle most cases, but major kernel version jumps require testing
When to use
- • Bootstrapping observability for existing services that have zero instrumentation today
- • Providing baseline RED metrics and trace spans before teams adopt the OTel SDK
- • Polyglot environments where maintaining SDK instrumentation across 5+ languages is impractical
- • Validating that services are correctly instrumented by comparing eBPF-captured metrics against SDK-reported metrics
When NOT to use
- • Custom business metrics like orders_placed or payment_amount. Use the OTel SDK with custom metric instruments
- • Windows or macOS hosts. eBPF is a Linux kernel feature with no equivalent on other operating systems
- • Kernels older than 5.8. Without BTF support, Beyla cannot attach its eBPF programs
- • Deep trace instrumentation with custom span attributes, baggage propagation, or manual context injection. Use OTel SDK
Key Points
- •Beyla uses eBPF uprobes to hook into user-space functions (Go net/http, Java servlet, Python Flask) and kprobes to hook into kernel-space network calls. This dual approach captures HTTP, gRPC, and SQL traffic regardless of the application language or framework
- •RED metrics are generated automatically: request rate (counter), error rate (counter by status code), and duration distribution (histogram with configurable buckets). These three signals cover 80% of what is needed to detect service degradation
- •Trace context propagation works by reading incoming W3C traceparent headers from HTTP requests at the kernel level. Beyla creates child spans and propagates context to downstream calls, enabling distributed traces without SDK involvement
- •The two-tier instrumentation model pairs Beyla with OTel SDK: Beyla provides baseline telemetry for every service automatically, while teams add OTel SDK instrumentation incrementally for custom business metrics, detailed span attributes, and profiling. This eliminates the all-or-nothing adoption problem
- •Resource consumption is predictable: ~200 MB RSS per node regardless of how many services run on that node, because eBPF programs execute in kernel space with fixed memory maps. CPU overhead is typically under 1% of a single core
Common Mistakes
- ✗Using Beyla as a complete replacement for the OTel SDK. Beyla provides RED metrics and basic spans. It cannot produce custom business metrics (revenue, conversions), detailed span attributes (user_id, cart_size), or profiling data. Plan for SDK adoption from the start and use Beyla as the baseline layer
- ✗Ignoring kernel version requirements. Beyla needs Linux 5.8+ with BTF enabled. Running it on older kernels silently fails or produces partial data. Check kernel version and BTF support before deploying
- ✗Not setting resource limits on the Beyla DaemonSet. While Beyla is lightweight (~200 MB RAM), without limits a misconfiguration or kernel bug could cause unbounded memory growth. Always set memory limits and requests in the pod spec
- ✗Expecting Beyla to instrument custom binary protocols or non-standard HTTP framing. Beyla parses standard HTTP/1.1, HTTP/2, gRPC, and common SQL wire protocols. Proprietary protocols will not be detected
- ✗Deploying Beyla without the OTel Collector pipeline. Beyla should export to the same OTel Collector fleet that handles SDK telemetry so that value-based routing, metadata enrichment, and deduplication apply uniformly to all signals