Connection Pool

What it is

A connection pool maintains a set of pre-established connections to some downstream (database, HTTP service, gRPC service, message broker). When the application needs to make a call, it borrows a connection from the pool, uses it, returns it. The pool reuses connections across many requests.

The reason pools exist: connection establishment is expensive. TCP handshake (1 round trip), TLS handshake (1-2 round trips), auth (1+ round trips). For a database that's 10-50ms before any query runs. For HTTPS to a remote API, similar. Building a connection per request slows every request and ties up CPU on the server.

A pool gives near-zero connection overhead, bounded resource usage (no surprise spike of 10K connections to the database), and back-pressure when the limit is reached.

What goes in a pool

Pool size limits: max connections, min idle. Cap how many can exist; keep some warm for instant use.

Acquire timeout: how long to wait for a connection when the pool is full. Fail fast rather than block forever.

Idle and lifetime timeouts: close connections that have been idle too long (server may have closed them); recycle connections periodically (DNS changes, load balancer rotation).

Health check on borrow: validate the connection works before handing it out. Cheap (a SELECT 1 query); catches dead connections before they fail user requests.

Leak detection: log a stack trace when a connection has been held longer than expected. Catches forgotten close()/return() at development time.

Sizing

Pool size is the most-asked, most-misunderstood question. The answer is Little's Law:

pool_size = throughput × average_latency

An app that runs 100 database queries per second at 50ms average latency needs 5 in-flight at a time. Add 2x headroom for bursts. So pool of 10.

The wrong way to size: "we have 200 threads, set pool size to 200." This wildly oversizes the pool and overwhelms the downstream. Most threads at any moment are not in the database; a connection per thread isn't needed.

The other constraint: per-app-instance pool size × number of instances must fit within the downstream's connection limit. Postgres default max_connections is 100; with 10 instances and pool of 20 each, that totals 200 → over the limit. Either tune the database or put a connection multiplexer (PgBouncer for Postgres) in front.

What goes wrong

Pool exhaustion. Pool is at max, every request blocks waiting for a connection, eventually times out. Causes: pool too small, OR connection leaks (acquired but never released). Diagnose with metrics (in-use count) and leak detection.

Connection leaks. Code that opens a connection without closing on the error path. The fix is try-with-resources / context manager / defer close(). Linters catch many of these.

Stale connections. Server-side closed the connection; pool still thinks it's good; next borrow fails. Health check on borrow plus a max-lifetime fix this.

Long-held connections. One slow query holds a connection for minutes; pool exhausted by that one request. Set query timeouts; long-running operations should use a separate pool or move out of the request path.

When a pool is unnecessary

Standalone CLI scripts that do one batch of work. Test fixtures. Single-instance services with very low traffic. For everything else, pool.

Beyond databases

The same pattern applies to HTTP clients (gRPC channels, OkHttp, axios), message brokers (Kafka producers, NATS clients), even AWS SDK calls. Most modern client libraries pool connections internally; some don't. Check the docs and configure the pool size; the defaults are often "1 connection" which is wrong for any real load.

Follow-up questions

▸How big should the connection pool be?

Use Little's Law: throughput × average query latency. At 100 queries/sec and 50ms average, 5 in-flight is the baseline. Add 2x headroom for bursts. So pool of 10. Don't size by number of app threads; that gives wildly oversized pools that overwhelm the database.

▸What is the relationship between pool size and database max_connections?

Pool size per app instance × number of app instances ≤ database max_connections × 0.8 (leave headroom). Postgres default max_connections is 100; with 10 app instances that's 8 connections each. Sounds low. Two responses: tune Postgres up to 200-500, or use PgBouncer in front to multiplex.

▸Should there be separate pools for read replicas and writers?

Yes. Read traffic is bursty; write traffic is steady. Separate pools allow each to be sized appropriately and route queries to the right backend. Most ORMs and DB libraries support multiple data sources.

▸Why recycle connections (maxLifetime)?

Three reasons. Server-side connection limits or memory leaks accumulate over time. DNS or load balancer changes need fresh connections to take effect. Server restarts may close idle connections without us noticing. Recycling every 30 minutes keeps the pool healthy without thinking about it.

What it is

A pool gives near-zero connection overhead, bounded resource usage (no surprise spike of 10K connections to the database), and back-pressure when the limit is reached.

What goes in a pool

Pool size limits: max connections, min idle. Cap how many can exist; keep some warm for instant use.

Acquire timeout: how long to wait for a connection when the pool is full. Fail fast rather than block forever.

Idle and lifetime timeouts: close connections that have been idle too long (server may have closed them); recycle connections periodically (DNS changes, load balancer rotation).

Health check on borrow: validate the connection works before handing it out. Cheap (a SELECT 1 query); catches dead connections before they fail user requests.

Leak detection: log a stack trace when a connection has been held longer than expected. Catches forgotten close()/return() at development time.

Sizing

Pool size is the most-asked, most-misunderstood question. The answer is Little's Law:

pool_size = throughput × average_latency

An app that runs 100 database queries per second at 50ms average latency needs 5 in-flight at a time. Add 2x headroom for bursts. So pool of 10.

What goes wrong

Connection leaks. Code that opens a connection without closing on the error path. The fix is try-with-resources / context manager / defer close(). Linters catch many of these.

Stale connections. Server-side closed the connection; pool still thinks it's good; next borrow fails. Health check on borrow plus a max-lifetime fix this.

When a pool is unnecessary

Standalone CLI scripts that do one batch of work. Test fixtures. Single-instance services with very low traffic. For everything else, pool.

Beyond databases

Follow-up questions

▸How big should the connection pool be?

▸What is the relationship between pool size and database max_connections?

▸Should there be separate pools for read replicas and writers?

▸Why recycle connections (maxLifetime)?

What it is

What goes in a pool

Sizing

What goes wrong

When a pool is unnecessary

Beyond databases

Implementations

Key points

Follow-up questions

Gotchas

Related reading

Connection Pool

What it is

What goes in a pool

Sizing

What goes wrong

When a pool is unnecessary

Beyond databases

Implementations

Key points

Follow-up questions

Gotchas

Related reading