Database Selection Framework

The Decision Matrix

Picking a database is one of the hardest-to-reverse decisions in a system. The right framework evaluates five dimensions: consistency requirements, query patterns, scale trajectory, operational maturity, and team expertise.

Build a simple matrix. List your top 10 queries in rows and the candidate databases in columns. Score each combination on a 1-5 scale for natural fit. This exercise alone eliminates most bad choices.

Relational Databases

PostgreSQL and MySQL cover 80% of use cases. PostgreSQL has stronger support for complex queries, JSON columns, and extensions like PostGIS and pg_trgm. MySQL has a simpler operational model and better replication ergonomics at scale. If you need ACID transactions across multiple tables, relational databases are still the best answer.

At scale, look at CockroachDB or TiDB for distributed SQL that maintains PostgreSQL or MySQL compatibility. Expect 2-3x latency overhead compared to a single-node setup, but you gain horizontal scalability without application changes.

Document Stores

MongoDB works well when your data is naturally hierarchical and your access pattern is "fetch one document by ID." It struggles when you need joins across collections or strong consistency across documents. DynamoDB is a better fit for pure key-value and key-range patterns where you can design your access patterns upfront. DynamoDB charges per read/write unit, so unpredictable query patterns get expensive fast.

Specialized Databases

Time-series: TimescaleDB (PostgreSQL extension) or InfluxDB for metrics, IoT, and financial tick data. Compression and retention policies are built in.

Graph: Neo4j for relationship-heavy queries like social networks, fraud detection, or recommendation engines. If you are doing more than 3-hop traversals, a graph database will outperform SQL self-joins by orders of magnitude.

Wide-column: Cassandra or ScyllaDB for write-heavy workloads at massive scale. Netflix runs Cassandra for over a trillion rows. The trade-off is eventual consistency and a rigid query model where you design tables around specific queries.

Key-value cache: Redis for sub-millisecond reads on hot data. Treat it as ephemeral. Teams that use Redis as a primary data store eventually regret it when they hit memory limits or lose data on restart.

The Polyglot Persistence Trap

Using multiple databases is sometimes necessary but always expensive operationally. Each database is another system to monitor, back up, upgrade, and page on. Stripe famously ran on just MongoDB and Redis for years before adding more databases. Start with as few databases as you can justify and add specialists only when PostgreSQL demonstrably cannot meet a specific requirement.

The Decision Matrix

Build a simple matrix. List your top 10 queries in rows and the candidate databases in columns. Score each combination on a 1-5 scale for natural fit. This exercise alone eliminates most bad choices.

Relational Databases

Document Stores

Specialized Databases

Time-series: TimescaleDB (PostgreSQL extension) or InfluxDB for metrics, IoT, and financial tick data. Compression and retention policies are built in.

The Polyglot Persistence Trap

Architecture Diagram

The Decision Matrix

Relational Databases

Document Stores

Specialized Databases

The Polyglot Persistence Trap

Key Points

Common Mistakes

Related Topics

Database Selection Framework

Architecture Diagram

The Decision Matrix

Relational Databases

Document Stores

Specialized Databases

The Polyglot Persistence Trap

Key Points

Common Mistakes

Related Topics