Prometheus

Why It Exists

Anyone who has run Nagios or Zabbix knows those tools were built for a world where servers had fixed hostnames and IP addresses that stuck around. That world is gone. Containers spin up and die in seconds, services auto-scale based on load, and yesterday's IP means nothing today.

Prometheus was built at SoundCloud in 2012 specifically for this reality. The team drew heavy inspiration from Google's internal Borgmon system. It graduated from the CNCF in 2018, only the second project to do so after Kubernetes, and at this point it is the default monitoring choice for cloud-native infrastructure. Its data model, PromQL, and exposition format have shaped every monitoring tool that came after it.

How It Works

Data Model: Every metric is a name plus a set of key-value labels. http_requests_total{method="GET", handler="/api/users", status="200"} is a time series, which is just a sequence of (timestamp, value) pairs. Labels enable slicing data across multiple dimensions: by method, by handler, by status, or any combination. This is far more flexible than the old hierarchical naming style (like servers.web01.http.get.200.count), and once teams adopt it, the old style feels unworkable.

Four Metric Types: Counter is a monotonically increasing value, good for total requests or total errors. Gauge goes up and down freely, useful for things like temperature or concurrent connections. Histogram buckets observations into configurable ranges and tracks sum and count, which is the right choice for request duration. Summary is similar to histogram but calculates quantiles on the client side. In practice, histograms are almost always the better choice because they are more flexible at query time.

Scraping: Prometheus pulls metrics by hitting each target's /metrics endpoint via HTTP GET at a configured interval (default is 15 seconds). Targets expose metrics in Prometheus exposition format or OpenMetrics format. Service discovery finds targets automatically from Kubernetes, Consul, DNS, EC2, Azure, or plain file-based config.

Architecture Deep Dive

TSDB (Time Series Database): Prometheus ships with a custom TSDB built for write-heavy, append-mostly workloads. Incoming data lands in an in-memory "head block" backed by a write-ahead log (WAL) for crash safety. Every 2 hours, the head block gets compacted into an immutable on-disk block containing compressed chunks, an inverted index mapping label sets to series, and metadata. Older blocks get merged into larger ones over time. The compression is genuinely impressive, typically 1-2 bytes per sample.

Query Engine: PromQL evaluation is lazy and iterator-based. The engine builds an execution tree from the query AST where each node is an iterator. Range vector selectors pull data from the TSDB, functions like rate() and increase() transform it, and aggregation operators like sum() and avg() combine series across label dimensions. This design handles queries touching millions of samples without choking.

Alertmanager: Alert rules are defined as PromQL expressions that get evaluated on a regular interval. When an expression returns a non-empty result, the alert fires. Alertmanager then takes over and handles routing (critical alerts go to PagerDuty, warnings go to Slack), grouping (batch related alerts into one notification), inhibition (suppress lower-severity alerts when a higher-severity one is active), and silencing (manually mute alerts during maintenance windows). In HA setups with multiple Prometheus instances, Alertmanager deduplicates so nobody gets paged twice for the same problem.

Scaling with Thanos: A single Prometheus instance handles 5-10 million active time series, which is a lot. But to go beyond that, or visibility across multiple clusters is needed, Thanos is the answer. Thanos Sidecar uploads Prometheus blocks to object storage like S3 or GCS, providing practically unlimited retention at low cost. Thanos Query provides a single PromQL endpoint that fans out to all Prometheus instances and deduplicates results. Thanos Compactor downsamples old data (5-minute averages after 1 week, 1-hour averages after 1 month) so queries over long time ranges stay fast.

GitLab runs their entire SaaS platform on Prometheus, scraping over 25 million time series across thousands of services. That is real-world proof it works at serious scale, typically with Thanos layered on top for long-term storage and cross-cluster querying.

Instrumentation Best Practices

Use the official client libraries (Go, Java, Python, Ruby, .NET) to instrument application code. At minimum, expose RED metrics for every service: http_requests_total (counter) for Rate, http_request_duration_seconds (histogram) for Duration, and http_requests_errors_total (counter) for Errors. Then add USE metrics for resources: Utilization, Saturation, and Errors for CPU, memory, disk, and network. Keep label names consistent across services. If one service calls it http_method and another calls it method, inconsistent naming makes cross-service queries painful.

Why It Exists

How It Works

Architecture Deep Dive

Instrumentation Best Practices

Use Cases

Architecture

Why It Exists

How It Works

Architecture Deep Dive

Instrumentation Best Practices

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies

Prometheus

Use Cases

Architecture

Why It Exists

How It Works

Architecture Deep Dive

Instrumentation Best Practices

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies