Data Architecture & Schema Evolution

Schema Changes Are the Scariest Kind of Migration

Code deploys are reversible. You push a bad build, you roll back. The data is still there, untouched, waiting for the next attempt. Schema changes are different. Once you alter a column, rename a table, or change a data type, the old shape is gone. Every consumer that depended on that shape breaks. And unlike a bad deploy, you cannot undo a schema change by clicking "rollback" in your CI pipeline.

At Staff level, interviewers ask about schema evolution to see how you think about irreversibility. The right answer is never "we just run the migration." It is always about phases, safety nets, and coordination.

A real example: you need to split a monolithic address text field into structured components (street, city, state, zip). Forty services read that field. Some parse it with regex. Some pass it through to a UI as-is. Some feed it into a geocoding service. Each of those consumption patterns needs a different migration path. The engineer who says "we will just add the new columns and deprecate the old one" has not thought about the team running regex parsing in a batch job that only deploys monthly.

The Expand-Contract Pattern in Practice

Expand-contract is not complicated in theory. You add new columns (expand), migrate data and consumers (transition), then remove old columns (contract). The hard part is the transition phase, because it can last months.

During the transition, you dual-write to both old and new columns on every insert and update. Your application code needs to read from the new columns but fall back to old columns for rows that have not been backfilled yet. You run a backfill job that processes historical data in batches, with checkpointing so it can resume after failures.

Here is where candidates stumble: they forget the verification step. Before you drop the old column, you need proof that zero consumers still read it. That means query logging, dependency scanning, and explicit sign-off from each consuming team. At LinkedIn, schema migrations include a "consumer attestation" step where each team confirms their code no longer references the deprecated column. Only after every team has attested do you schedule the drop.

The timeline for a major schema change at a company with 40 consumers is typically 3 to 6 months. Interviewers who hear "two weeks" know you have not done this at scale.

Data Contracts Between Services

A data contract is an explicit agreement about the shape, semantics, and freshness of data that one team provides and another consumes. Without contracts, you get implicit dependencies: team B reads team A's database directly, assumes a column is never null, and breaks when team A adds a nullable migration.

Schema registries (like Confluent Schema Registry for Kafka, or Protobuf for gRPC) enforce contracts at the infrastructure level. When a producer tries to register a schema change that breaks backward compatibility, the registry rejects it. The producer has to either make the change backward-compatible or negotiate a new contract version with consumers.

The ownership model matters as much as the tooling. The producing team owns the schema definition. Consumers declare which version they depend on. Version negotiation happens through pull requests to a shared contract repository, not through Slack messages. When the contract repo is the source of truth, you can audit every data dependency in the organization with a single search.

When Denormalization Is and Is Not the Answer

Denormalization gets proposed every time a query is slow. Sometimes it is the right call. Often it is not.

You should denormalize when your read-to-write ratio is heavily skewed toward reads, the join cost is measurable and consistent (not just slow on one outlier query), and you have a clear plan for keeping the denormalized data in sync. An e-commerce product listing page that joins products, prices, inventory, and reviews on every request is a classic case. Materializing that data into a read-optimized store cuts latency from 200ms to 15ms.

You should not denormalize when the "slow query" is actually missing an index, when write volume is high enough that maintaining consistency across two stores becomes its own bottleneck, or when the data changes frequently enough that your denormalized copy is stale before it is read. A reporting dashboard that runs hourly does not need denormalized tables; it needs a proper analytics pipeline.

The interview answer interviewers want: "I would profile the query first, check for missing indexes, and consider caching before denormalizing. If denormalization is the right call, I would use an async event-driven approach to keep the denormalized store updated, accept a consistency window of X seconds, and build a reconciliation job that runs nightly to catch drift."

Schema Changes Are the Scariest Kind of Migration

The Expand-Contract Pattern in Practice

The timeline for a major schema change at a company with 40 consumers is typically 3 to 6 months. Interviewers who hear "two weeks" know you have not done this at scale.

Data Contracts Between Services

When Denormalization Is and Is Not the Answer

Denormalization gets proposed every time a query is slow. Sometimes it is the right call. Often it is not.

Schema Changes Are the Scariest Kind of Migration

The Expand-Contract Pattern in Practice

Data Contracts Between Services

When Denormalization Is and Is Not the Answer

Sample Questions

Evaluation Criteria

Key Points

Common Mistakes

Related Topics

Data Architecture & Schema Evolution

Schema Changes Are the Scariest Kind of Migration

The Expand-Contract Pattern in Practice

Data Contracts Between Services

When Denormalization Is and Is Not the Answer

Sample Questions

Evaluation Criteria

Key Points

Common Mistakes

Related Topics