PostgreSQL

PostgreSQL is the database most teams should start with, and many never need to leave. Apple, Instagram, Spotify, Reddit all run it in production. That is not a coincidence. After 35+ years of development, Postgres handles everything from basic CRUD apps to analytical workloads on petabytes of data. It is not the fastest option for every access pattern, but the combination of ACID compliance, extensibility, and SQL standards support makes it the safest default for a primary datastore.

How It Works Internally

PostgreSQL's concurrency model, MVCC (Multi-Version Concurrency Control), stores multiple physical versions of each row right in the heap. Every row carries xmin (the transaction ID that created it) and xmax (the transaction ID that deleted or updated it) as system columns. When a row is updated, Postgres inserts a new version and stamps the old one with xmax, building a version chain. Readers only see versions visible to their snapshot, defined by the transaction's starting XID and the commit log (CLOG). The upside: reads never block writes. The downside: dead tuples accumulate, and VACUUM has to clean them up.

This is the defining detail of Postgres internals. Unlike Oracle or MySQL/InnoDB, which use a separate undo log, Postgres leaves old row versions in place. That design choice shapes everything from the autovacuum configuration to how long-running transactions are handled.

The Write-Ahead Log (WAL) is how Postgres delivers durability and replication. Every data modification goes to sequential WAL segments (16MB each by default) before the transaction commits. WAL records are binary representations of page-level changes. During crash recovery, Postgres replays WAL from the last checkpoint forward. Streaming replication ships these WAL records to replicas in near-real-time. With synchronous replication enabled, the primary waits for at least one replica to confirm before acknowledging the commit. The result is zero data loss, at the cost of write latency.

Postgres supports multiple index types, and picking the right one matters more than most engineers realize. B-tree indexes handle equality and range queries on scalar types. GIN (Generalized Inverted Index) indexes are the right choice for full-text search, JSONB containment queries, and array operations. GiST (Generalized Search Tree) indexes cover geometric data, range types, and nearest-neighbor queries. BRIN (Block Range Index) indexes store min/max summaries per block range, using orders of magnitude less space for naturally ordered data like timestamps. A 1TB time-series table can have a BRIN index under 1MB versus a 20GB B-tree. That is not a typo.

Production Architecture

A production Postgres deployment revolves around a primary instance with streaming replicas. Put PgBouncer in front of the primary in transaction pooling mode. This collapses thousands of application connections down to hundreds of actual Postgres backends. For financial or compliance workloads, use synchronous replication to at least one replica. For read scaling and disaster recovery, async replicas do the job.

Configure replication slots so the primary does not discard WAL segments that replicas have not consumed yet. Keep an eye on pg_stat_replication for lag. If lag exceeds a few seconds, that points to network issues or an overloaded replica. Use pg_basebackup for initial replica provisioning and point-in-time recovery (PITR) backups. For automated failover, Patroni (used by GitLab and Zalando) works well with etcd or ZooKeeper for consensus.

One thing I will be direct about: Postgres replication is not as turnkey as managed databases make it look. Expect to spend time configuring slots, monitoring lag, and planning failover. But once the pieces are understood, the operational model is straightforward and well-documented.

Capacity Planning

A single Postgres instance on a db.r6g.4xlarge (128GB RAM, 16 vCPUs, io2 storage) can push 10,000-50,000 transactions per second depending on query complexity. Set shared_buffers to 25% of RAM (32GB), effective_cache_size to 75% (96GB), and work_mem to 128-256MB for analytical queries. Watch max_connections carefully. Each backend costs 5-10MB, so 500 connections burn 2.5-5GB just on process overhead. This is why PgBouncer is not optional.

Monitor tup_deleted versus tup_inserted ratios. If deletes consistently outpace inserts, table bloat is building up. Track pg_stat_user_tables.n_dead_tup and make sure autovacuum runs often enough. On modern SSDs, set autovacuum_vacuum_cost_delay to 2ms (the default is 20ms) so VACUUM can actually keep up.

Failure Scenarios

Scenario 1: XID wraparound shutdown. This one is scary because it can bring down the database with no obvious warning without monitoring the right metric. PostgreSQL uses 32-bit transaction IDs that wrap around at roughly 4.2 billion. When a table's relfrozenxid falls too far behind the current XID, Postgres enters "emergency autovacuum" mode. If that fails (usually because of a bloated table or I/O saturation), Postgres shuts down with: "database is not accepting commands to avoid wraparound data loss." Samsara hit this in 2019 and lost hours of availability. Detection: monitor age(relfrozenxid) per table and alert when anything exceeds 500 million transactions. Prevention: tune autovacuum_freeze_max_age and give autovacuum workers enough I/O bandwidth to do their job.

Scenario 2: Replication slot causing disk exhaustion. A paused or disconnected replica with an active replication slot stops the primary from deleting old WAL segments. WAL piles up on the primary until the disk fills, which kills all writes and can corrupt in-flight transactions. GitLab hit a variant of this during their 2017 database incident. Detection: monitor pg_replication_slots for inactive slots and use pg_wal_lsn_diff() to check the gap between current WAL position and each slot's restart LSN. Starting with PostgreSQL 13, set max_slot_wal_keep_size to cap WAL retention per slot. This automatically invalidates slots that fall too far behind, which is better than running out of disk.

How It Works Internally

Production Architecture

Capacity Planning

Failure Scenarios

Use Cases

Architecture

How It Works Internally

Production Architecture

Capacity Planning

Failure Scenarios

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies

PostgreSQL

Use Cases

Architecture

How It Works Internally

Production Architecture

Capacity Planning

Failure Scenarios

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies