OTel Collector
The vendor-neutral telemetry pipeline for receiving, processing, and exporting observability data
Use Cases
Architecture
Why It Exists
Without the Collector, every application needs to know where its telemetry goes. The Go service exports to VictoriaMetrics, the Java service exports to Tempo, the Python service exports to VictoriaLogs. Change a backend and every application gets redeployed. Add a new backend and every SDK config needs updating. With 500 services, that's 500 deploys for one infrastructure change.
The Collector breaks this coupling. Applications export to localhost:4317 and forget about it. The Collector handles format conversion, filtering, enrichment, batching, retry, and routing. The backend could be VictoriaMetrics today and Datadog tomorrow -- the application code never changes.
This is the N×M problem. Without a Collector, N SDK languages times M backends = N×M integration paths. With a Collector, the team maintains N SDK-to-OTLP paths and M Collector-to-backend paths = N+M. At scale, this is the difference between manageable and impossible.
The Collector started as the OpenCensus "agent" in 2018. When OpenCensus and OpenTracing merged into OpenTelemetry in 2019, the agent became the OTel Collector. It's a CNCF project written in Go, maintained by the same community that maintains the OTel SDKs. The platform/infrastructure team deploys and operates it -- application teams never touch it (see the OpenTelemetry page for the full ownership model).
Internal Architecture: How Data Flows at Runtime
The Collector is a pipeline engine. When it starts, it reads the YAML config, instantiates the configured components, and wires them together into pipelines. Each pipeline is a chain: receiver → processor₁ → processor₂ → ... → exporter.
The pdata internal representation. Telemetry arrives at a receiver as protobuf bytes (OTLP), Prometheus text, or JSON. The receiver deserializes it into pdata -- the Collector's internal data model. pdata is not protobuf. It's an optimized Go struct designed for zero-copy processing. Processors operate on pdata objects, and exporters serialize them back to the target format. This means format conversion (OTLP in, Prometheus remote write out) happens naturally: the receiver and exporter handle serialization, processors work on a common format.
Fan-in and fan-out. A single pipeline can have multiple receivers (e.g., both otlp and prometheus feed the metrics pipeline) and multiple exporters (e.g., metrics go to both VictoriaMetrics and a Kafka topic). The Collector copies data when fanning out to multiple exporters, so one slow exporter doesn't block another.
Concurrency model. Each receiver runs in its own goroutine(s). When data arrives, the receiver calls the first processor's Consume method. Processors are chained: each processor calls the next. The last processor calls the exporter. For stateful processors (batch, tail_sampling), internal goroutines handle buffering and flushing. The batch processor, for example, accumulates data in a buffer and flushes either when send_batch_size is reached or when timeout expires, whichever comes first.
Incoming telemetry
│
▼
┌─────────────┐
│ Receiver │ Deserializes wire format → pdata
│ (otlp) │ Runs in its own goroutine
└──────┬──────┘
│ pdata.Metrics / pdata.Traces / pdata.Logs
▼
┌─────────────┐
│ Processor 1 │ memory_limiter: checks RSS, drops if over limit
└──────┬──────┘
│
▼
┌─────────────┐
│ Processor 2 │ filter: drops matching metrics/spans
└──────┬──────┘
│
▼
┌─────────────┐
│ Processor 3 │ batch: accumulates, flushes on size or timeout
└──────┬──────┘
│
▼
┌─────────────┐
│ Exporter │ Serializes pdata → wire format, sends to backend
│ (promrw) │ Sending queue + retry + backpressure
└─────────────┘
The Plugin Architecture
The Collector is not a monolithic binary. It's assembled from plugins at build time.
Two official distributions:
| Distribution | Components | Maintained By | Use Case |
|---|---|---|---|
otelcol-core | ~20 (basic receivers, processors, exporters) | OTel core team | Testing, minimal deployments |
otelcol-contrib | 100+ (vendor exporters, advanced processors) | Community contributors | Production (most teams use this) |
Each component (receiver, processor, exporter) implements a Go interface. For a processor, it looks like this:
// Simplified -- the real interfaces are in go.opentelemetry.io/collector/consumer
type TracesProcessor interface {
ConsumeTraces(ctx context.Context, td ptrace.Traces) error
Capabilities() consumer.Capabilities
Start(ctx context.Context, host component.Host) error
Shutdown(ctx context.Context) error
}
When the Collector starts, it reads the YAML config, looks up each component by name in its registry ("filter" → filterprocessor.NewFactory()), and calls CreateTracesProcessor() or CreateMetricsProcessor() with the config block for that component. The components are chained in the order listed under service.pipelines.
Building a custom distribution. For a component that's not in contrib (e.g., a proprietary exporter for an internal system), use the OpenTelemetry Collector Builder (ocb):
# builder-config.yaml
dist:
name: my-otelcol
output_path: ./build
receivers:
- gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.96.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.96.0
processors:
- gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.96.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor v0.96.0
exporters:
- gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.96.0
- gomod: github.com/your-org/internal-exporter v1.0.0 # custom component
ocb --config builder-config.yaml
# Produces a single Go binary with exactly the components you listed
This is how vendor distributions are built too. Grafana Alloy, Datadog's OTel Collector, and AWS's ADOT Collector are all custom builds with vendor-specific components included.
Deep Dive: Key Processors
memory_limiter
The safety net. Must be the first processor in every pipeline.
processors:
memory_limiter:
check_interval: 1s # How often to check RSS
limit_mib: 1500 # Hard limit (total Collector memory)
spike_limit_mib: 500 # Soft limit = limit_mib - spike_limit_mib = 1000 MiB
How it works internally: Every check_interval, the processor reads the process RSS from /proc/self/status. If RSS exceeds limit_mib - spike_limit_mib (the soft limit, 1000 MiB in this example), it triggers a Go runtime.GC() to reclaim memory. If RSS still exceeds limit_mib (the hard limit, 1500 MiB), the processor starts refusing data -- it returns an error to the receiver, which applies backpressure to the SDK. The SDK's BatchSpanProcessor drops spans when its own queue fills. This is controlled data loss: better to lose some telemetry than to OOM the Collector and lose everything.
batch
Groups individual data points into batches for efficient export. Without batching, the Collector makes one HTTP request per span or metric -- at 100K metrics/sec, that's 100K HTTP calls/sec to the backend.
processors:
batch:
send_batch_size: 8192 # Flush when batch reaches this many items
timeout: 200ms # Flush after this duration, even if batch isn't full
send_batch_max_size: 0 # 0 = no upper limit (send_batch_size is the target, not a cap)
The tradeoff: Larger batches = fewer HTTP calls = lower overhead. But larger batches = higher latency (data sits in the buffer longer). A timeout of 200ms means worst-case 200ms added to ingestion-to-visibility latency. For most observability use cases, this is acceptable. For alerting pipelines that need sub-second latency, reduce timeout to 50-100ms and accept the higher export overhead.
filter
Drops telemetry that matches specified patterns. This is where value-based routing starts -- dropping noise before it reaches the backend.
processors:
filter:
error_mode: ignore # Don't fail the pipeline on regex errors
metrics:
exclude:
match_type: regexp
metric_names:
- "kube_pod_status_ready" # Redundant with kube-state-metrics
- ".*health.*" # Health check endpoints
- ".*readiness.*" # Readiness probe endpoints
traces:
span:
exclude:
match_type: regexp
attributes:
- key: http.target
value: "/(healthz|readyz|livez)"
This processor evaluates every incoming metric name and span attribute against the configured patterns. Matching items are silently dropped. At a 20% noise rate (health checks, readiness probes, debug endpoints), the filter processor reduces backend storage costs by 20% with zero impact on useful data.
attributes
Adds, modifies, or deletes attributes on spans and metrics. The primary use case is enriching telemetry with context that the SDK doesn't have -- Kubernetes metadata, tenant IDs, deployment versions.
processors:
attributes:
actions:
- key: k8s.namespace.name
action: upsert
from_context: k8s.namespace.name
- key: deployment.environment
action: insert
value: production
- key: internal.debug_tag # Remove internal attributes before export
action: delete
The k8sattributes processor is a specialized version that queries the Kubernetes API to resolve pod IP → pod name, namespace, node, labels, and annotations. It caches the mappings locally to avoid hitting the API on every request.
tail_sampling
The most complex processor. Buffers all spans for a configurable duration (typically 60 seconds), then makes a keep/drop decision based on policies.
processors:
tail_sampling:
decision_wait: 60s # Buffer spans for 60 seconds
num_traces: 100000 # Max traces in buffer
expected_new_traces_per_sec: 1000
policies:
- name: errors-always
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-requests
type: latency
latency: { threshold_ms: 500 }
- name: baseline
type: probabilistic
probabilistic: { sampling_percentage: 0.1 }
Why it needs Gateway mode: A single trace might have spans from 8 different services on 8 different nodes. DaemonSet collectors receive spans from local pods only -- no single DaemonSet collector has all spans for one trace. The tail sampler needs the complete trace to decide (it can't know if there's an error span it hasn't seen yet). Solution: use the loadbalancing exporter on DaemonSet collectors to route all spans with the same trace_id to the same Gateway collector instance. The Gateway fleet runs the tail_sampling processor.
DaemonSet Collectors (3,000 nodes)
│ loadbalancing exporter (route by trace_id hash)
▼
Gateway Collectors (50 instances, tail_sampling processor)
│ keep/drop decision after 60s buffer
▼
Kafka → Tempo
transform (OTTL)
The OpenTelemetry Transformation Language allows complex transformations that go beyond what attributes or filter can do.
processors:
transform:
metric_statements:
- context: datapoint
statements:
# Rename a metric
- set(metric.name, "http.server.duration") where metric.name == "http_server_request_duration_seconds"
# Drop a specific label to reduce cardinality
- delete_key(attributes, "instance_id")
log_statements:
- context: log
statements:
# Extract structured fields from a log body
- set(attributes["user_id"], ParseJSON(body)["user_id"]) where IsMatch(body, ".*user_id.*")
OTTL is powerful but adds CPU overhead per statement. Use it for transformations that filter and attributes can't handle, not as a replacement for them.
routing
Routes telemetry to different exporters based on attributes. This enables multi-tier storage: high-value data goes to fast storage, low-value data goes to cheap storage or gets dropped.
processors:
routing:
from_attribute: slo_tier
table:
- value: gold
exporters: [otlp/tempo-hot] # Fast NVMe-backed Tempo
- value: silver
exporters: [otlp/tempo-standard] # Standard S3-backed Tempo
- value: bronze
exporters: [otlp/tempo-cold] # Heavily sampled, cheap storage
default_exporters: [otlp/tempo-standard]
Deep Dive: Key Receivers
otlp -- The default receiver. Accepts OTLP over gRPC (:4317) and HTTP (:4318). Every OTel SDK and Beyla instance exports here. Supports TLS, mTLS, and compression (gzip, zstd).
prometheus -- Scrapes Prometheus /metrics endpoints. Supports Kubernetes service discovery (kubernetes_sd_configs) so it automatically discovers pods with the prometheus.io/scrape: "true" annotation. This is how the Collector replaces a standalone Prometheus server for metric scraping.
filelog -- Tails log files from disk. Configured with include paths like /var/log/pods/*/*/*.log. Parses container runtime log formats (CRI, Docker JSON). Adds filename, namespace, and pod name as attributes. This is the standard way to collect Kubernetes pod logs without requiring applications to push logs via OTLP.
kafka -- Consumes from Kafka topics. Used when Kafka sits between DaemonSet collectors and backend-facing collectors. The DaemonSet collectors produce to Kafka (fast, fire-and-forget), and a separate Collector fleet consumes from Kafka and writes to backends (with retry and backpressure handling).
Deep Dive: Key Exporters
otlp -- Sends OTLP to another Collector instance or an OTLP-compatible backend (Tempo, Pyroscope, Grafana Cloud). Supports gRPC and HTTP, with configurable compression and TLS.
prometheusremotewrite -- Writes metrics to any Prometheus remote write compatible endpoint: VictoriaMetrics (vminsert), Grafana Mimir, Thanos, or Cortex. Converts pdata.Metrics to the Prometheus remote write protobuf format.
kafka -- Produces to Kafka topics. Used as a durable buffer between collection and storage. The DaemonSet Collector exports to Kafka, which decouples collection speed from storage write speed. If the backend is temporarily slow, Kafka absorbs the burst.
loadbalancing -- Distributes spans across a pool of downstream Collector instances based on a routing key (typically trace_id). This ensures all spans from one trace land on the same instance, which is required for tail-based sampling. The exporter resolves backend instances via DNS or a static list and uses consistent hashing for stable routing.
exporters:
loadbalancing:
routing_key: traceID
protocol:
otlp:
tls:
insecure: true
resolver:
dns:
hostname: otel-sampling-gateway # headless K8s Service
port: 4317
Backpressure and Flow Control
What happens when the backend can't keep up:
SDK → Receiver → Processors → Exporter → [sending_queue] → Backend (slow)
│
Queue fills up
│
┌────────────────┴────────────────┐
│ │
Retry with backoff Queue full: drop data
(5s, 10s, 30s, 60s) otelcol_exporter_send_failed_spans++
│
Backend recovers → queue drains
The sending queue sits inside each exporter. By default it's in-memory with a capacity of 256 batches. When the backend returns errors or times out, batches accumulate in the queue. The exporter retries with exponential backoff (configurable: initial_interval, max_interval, max_elapsed_time).
exporters:
otlp/tempo:
endpoint: tempo-distributor:4317
sending_queue:
enabled: true
num_consumers: 10 # Parallel export goroutines
queue_size: 1000 # Max batches in queue
storage: file_storage # Persistent queue (survives crashes)
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 60s
max_elapsed_time: 300s # Give up after 5 minutes
Persistent queue: By default, the sending queue is in-memory. If the Collector crashes, queued data is lost. Enabling storage: file_storage writes queued batches to a local WAL (write-ahead log) on disk. On restart, the Collector replays the WAL. The trade-off is disk I/O overhead.
When the sending queue is completely full and the backend is still unresponsive, the exporter blocks. Backpressure propagates backward through the processor chain to the receiver. The receiver applies TCP backpressure to the SDK. The SDK's BatchSpanProcessor drops the oldest spans from its in-memory buffer. This is the correct behavior: controlled data loss at the source is better than OOM anywhere in the pipeline.
End-to-End Example: A Metric Through the Pipeline
A Go service records a custom metric using the OTel SDK:
orderValue.Record(ctx, 149.99, metric.WithAttributes(
attribute.String("currency", "USD"),
attribute.String("region", "us-east-1"),
))
Here's what happens at each stage of the Collector pipeline:
1. SDK exports via OTLP. The SDK's PeriodicReader flushes accumulated metrics every 60 seconds. It serializes order_value_dollars with its histogram buckets, sum, count, and attributes into an OTLP protobuf message and sends it to localhost:4317 via gRPC.
2. OTLP receiver deserializes. The Collector's otlp receiver accepts the gRPC call, deserializes the protobuf into a pdata.Metrics object -- the internal representation. The metric is now a Go struct in memory, not bytes.
3. memory_limiter checks. The processor reads /proc/self/status. RSS is 800 MiB, well under the 1500 MiB limit. Data passes through unchanged.
4. filter evaluates. The processor checks the metric name order_value_dollars against the exclude patterns (.*health.*, .*readiness.*). No match. Data passes through.
5. attributes enriches. The processor adds k8s.namespace.name=checkout and deployment.environment=production to the metric's attribute set. The metric now carries Kubernetes context that the SDK didn't have.
6. batch accumulates. The processor adds this metric to its in-memory buffer. The buffer now has 4,231 metrics. It hasn't reached send_batch_size (8,192) yet, and timeout (200ms) hasn't expired. The metric waits.
... 150ms later, more metrics arrive, pushing the buffer to 8,192.
7. batch flushes. The buffer hits send_batch_size. The batch processor sends all 8,192 metrics to the exporter in one call.
8. prometheusremotewrite serializes. The exporter converts the 8,192 pdata.Metrics into a single Prometheus remote write protobuf request. It compresses with Snappy and sends an HTTP POST to http://vminsert:8480/insert/0/prometheus/api/v1/write.
9. vminsert acknowledges. VictoriaMetrics returns 200 OK. The metric is now durably stored and queryable within 2 seconds.
Total time from SDK Record() to Grafana-visible: SDK flush interval (60s) + batch timeout (up to 200ms) + network + storage write = ~60-62 seconds. The SDK flush interval dominates. For lower latency, reduce the SDK's PeriodicReader interval (at the cost of more frequent exports).
Monitoring the Collector Itself
The Collector exposes Prometheus metrics on :8888/metrics. These are the critical ones to alert on:
otelcol_receiver_accepted_spans # Spans successfully received
otelcol_receiver_refused_spans # Spans rejected (backpressure)
otelcol_processor_dropped_spans # Spans dropped by processors (filter, memory_limiter)
otelcol_exporter_sent_spans # Spans successfully exported
otelcol_exporter_send_failed_spans # Spans that failed to export (backend errors)
otelcol_exporter_queue_size # Current items in the sending queue
otelcol_exporter_queue_capacity # Max queue size
otelcol_process_memory_rss # Collector process RSS in bytes
Alert rules:
# Collector is dropping data
- alert: OTelCollectorDataLoss
expr: rate(otelcol_exporter_send_failed_spans[5m]) > 0
for: 2m
labels:
severity: warning
# Memory pressure -- memory_limiter is kicking in
- alert: OTelCollectorMemoryPressure
expr: rate(otelcol_processor_dropped_spans[5m]) > 0
for: 1m
labels:
severity: critical
# Sending queue filling up -- backend can't keep up
- alert: OTelCollectorQueueSaturation
expr: otelcol_exporter_queue_size / otelcol_exporter_queue_capacity > 0.8
for: 5m
labels:
severity: warning
These alerts belong in the meta-monitoring stack (the separate watchdog Prometheus) described in the observability platform blog post, not in the Collector's own pipeline. Monitoring the monitoring system through itself is a recipe for blind spots.
Configuration Management at Scale
With 3,000 DaemonSet collectors and 50 Gateway collectors, managing configs manually is impossible.
GitOps pattern: The Collector config lives in a Git repository. A CI/CD pipeline validates the YAML (the Collector has a validate subcommand), generates a Kubernetes ConfigMap, and applies it. The Collector's --config flag points to the ConfigMap mount. To update: push to Git → CI validates → merge → ArgoCD/Flux applies the new ConfigMap → Collector pods restart with the new config.
OpAMP (Open Agent Management Protocol): An emerging standard for remote configuration of OTel Collectors. An OpAMP server pushes config updates to connected Collectors without restarting them. This enables feature gates (gradually roll out a new processor to 10% of collectors, then 50%, then 100%) and emergency config changes (add a filter to drop a cardinality bomb in real-time). OpAMP is still maturing but is the future direction for fleet management.
Feature gates: The Collector supports feature gates that enable/disable specific behaviors without config changes. Gates are set via the --feature-gates CLI flag. This is how the OTel project rolls out breaking changes: new behavior behind a gate in version N, gate enabled by default in version N+1, gate removed in version N+2.
otelcol --config config.yaml --feature-gates=+exporter.otlp.useConfiguredEndpoint,-processor.batch.enabled
Pros
- • Vendor-neutral pipeline. The same Collector instance exports to VictoriaMetrics, Tempo, Datadog, or any OTLP-compatible backend. Switch backends without touching application code
- • Plugin architecture with 100+ pre-built components. Receivers, processors, and exporters are assembled into a single binary at build time and configured entirely through YAML
- • Backpressure-aware. When a backend is slow, the sending queue buffers data, retries with exponential backoff, and only drops data as a last resort. The memory_limiter processor prevents OOM
- • Runs anywhere. DaemonSet on Kubernetes, sidecar, gateway, bare-metal binary, Docker container. Same binary, same config format
- • Self-monitoring built in. The Collector exposes its own metrics (accepted/dropped/exported counts, queue depth, memory usage) on a Prometheus endpoint for meta-monitoring
Cons
- • Configuration complexity. 100+ components with different config schemas. Getting the processor chain right (order matters) requires operational experience
- • Single-binary plugin model means you can't add a custom component at runtime. Adding a new receiver or exporter requires rebuilding the binary with the Collector Builder (ocb)
- • Memory overhead for stateful processors. Tail-based sampling buffers all spans for 60 seconds. At high throughput, this consumes several GB of RAM per instance
- • No built-in persistent buffering in the default setup. The in-memory sending queue loses data on crash. Persistent queue (disk WAL) is available but adds I/O overhead
- • Debug difficulty. When data disappears between SDK and backend, tracing which processor dropped it requires checking multiple internal metrics
When to use
- • Any production observability pipeline. The Collector should sit between your applications and your storage backends
- • When you need to process telemetry before storage: filter noise, enrich with metadata, sample traces, route by tenant
- • Multi-backend setups where metrics go to VictoriaMetrics, traces to Tempo, and logs to VictoriaLogs from the same collection layer
- • When you want to change backends without redeploying applications
When NOT to use
- • Simple single-app setups where the SDK can export directly to the backend with no processing needed
- • When latency of even a single network hop is unacceptable (though DaemonSet mode uses localhost, so overhead is minimal)
- • As a long-term storage buffer. The Collector is a processing pipeline, not a message queue. Use Kafka for durable buffering
Key Points
- •The Collector is a pipeline engine. Data flows through receiver → processor chain → exporter. Each signal (metrics, traces, logs) has its own pipeline defined in the service.pipelines YAML block. Pipelines are independent -- a trace pipeline failure doesn't affect metrics
- •Processors execute in order. The sequence matters: memory_limiter first (prevents OOM), then filter (drop noise), then enrich (add metadata), then batch (group for efficient export). Reversing filter and batch wastes CPU batching data destined to be dropped
- •Two distributions exist: otelcol-core (minimal, ~20 components maintained by the OTel project) and otelcol-contrib (community, 100+ components including vendor-specific exporters). Most production deployments use contrib or a custom build via the Collector Builder
- •Backpressure propagates backward through the pipeline. When an exporter's sending queue is full, it blocks the processor chain, which blocks the receiver, which applies TCP backpressure to the SDK. The SDK's BatchSpanProcessor drops spans when its queue fills. This is intentional -- controlled data loss is better than OOM
- •The Collector exposes self-monitoring metrics on :8888/metrics. Key metrics: otelcol_receiver_accepted_spans, otelcol_processor_dropped_spans, otelcol_exporter_send_failed_spans, otelcol_exporter_queue_size. Alert on these to catch pipeline issues before data loss becomes visible in dashboards
Common Mistakes
- ✗Not putting memory_limiter as the first processor. Without it, a traffic spike fills the processor buffers until the Collector OOMs. The memory_limiter checks RSS every check_interval and triggers GC or drops data to stay under the limit
- ✗Wrong processor order. Putting batch before filter means you batch everything including data destined to be dropped. Correct: memory_limiter → filter → enrich → batch → export
- ✗Using the core distribution in production. otelcol-core lacks most useful processors (filter, attributes, transform, tail_sampling) and exporters (prometheusremotewrite, kafka). Use otelcol-contrib or build a custom distribution
- ✗No sending queue configuration on exporters. The default in-memory queue is small (256 items). For production, increase queue_size and enable persistent_storage for crash resilience
- ✗Running tail-based sampling on DaemonSet collectors. Tail sampling needs all spans from one trace on one instance. DaemonSet collectors receive random spans from local pods. Use a dedicated Gateway fleet with the loadbalancing exporter routing by trace_id
- ✗Not monitoring the Collector itself. If otelcol_exporter_send_failed_spans is climbing, data is being lost. If otelcol_processor_dropped_spans spikes, the memory_limiter is kicking in. These must be in the meta-monitoring dashboard