Data Architecture & Schema Evolution
Schema Changes Are the Scariest Kind of Migration
Code deploys are reversible. You push a bad build, you roll back. The data is still there, untouched, waiting for the next attempt. Schema changes are different. Once you alter a column, rename a table, or change a data type, the old shape is gone. Every consumer that depended on that shape breaks. And unlike a bad deploy, you cannot undo a schema change by clicking "rollback" in your CI pipeline.
At Staff level, interviewers ask about schema evolution to see how you think about irreversibility. The right answer is never "we just run the migration." It is always about phases, safety nets, and coordination.
A real example: you need to split a monolithic address text field into structured components (street, city, state, zip). Forty services read that field. Some parse it with regex. Some pass it through to a UI as-is. Some feed it into a geocoding service. Each of those consumption patterns needs a different migration path. The engineer who says "we will just add the new columns and deprecate the old one" has not thought about the team running regex parsing in a batch job that only deploys monthly.
The Expand-Contract Pattern in Practice
Expand-contract is not complicated in theory. You add new columns (expand), migrate data and consumers (transition), then remove old columns (contract). The hard part is the transition phase, because it can last months.
During the transition, you dual-write to both old and new columns on every insert and update. Your application code needs to read from the new columns but fall back to old columns for rows that have not been backfilled yet. You run a backfill job that processes historical data in batches, with checkpointing so it can resume after failures.
Here is where candidates stumble: they forget the verification step. Before you drop the old column, you need proof that zero consumers still read it. That means query logging, dependency scanning, and explicit sign-off from each consuming team. At LinkedIn, schema migrations include a "consumer attestation" step where each team confirms their code no longer references the deprecated column. Only after every team has attested do you schedule the drop.
The timeline for a major schema change at a company with 40 consumers is typically 3 to 6 months. Interviewers who hear "two weeks" know you have not done this at scale.
Data Contracts Between Services
A data contract is an explicit agreement about the shape, semantics, and freshness of data that one team provides and another consumes. Without contracts, you get implicit dependencies: team B reads team A's database directly, assumes a column is never null, and breaks when team A adds a nullable migration.
Schema registries (like Confluent Schema Registry for Kafka, or Protobuf for gRPC) enforce contracts at the infrastructure level. When a producer tries to register a schema change that breaks backward compatibility, the registry rejects it. The producer has to either make the change backward-compatible or negotiate a new contract version with consumers.
The ownership model matters as much as the tooling. The producing team owns the schema definition. Consumers declare which version they depend on. Version negotiation happens through pull requests to a shared contract repository, not through Slack messages. When the contract repo is the source of truth, you can audit every data dependency in the organization with a single search.
When Denormalization Is and Is Not the Answer
Denormalization gets proposed every time a query is slow. Sometimes it is the right call. Often it is not.
You should denormalize when your read-to-write ratio is heavily skewed toward reads, the join cost is measurable and consistent (not just slow on one outlier query), and you have a clear plan for keeping the denormalized data in sync. An e-commerce product listing page that joins products, prices, inventory, and reviews on every request is a classic case. Materializing that data into a read-optimized store cuts latency from 200ms to 15ms.
You should not denormalize when the "slow query" is actually missing an index, when write volume is high enough that maintaining consistency across two stores becomes its own bottleneck, or when the data changes frequently enough that your denormalized copy is stale before it is read. A reporting dashboard that runs hourly does not need denormalized tables; it needs a proper analytics pipeline.
The interview answer interviewers want: "I would profile the query first, check for missing indexes, and consider caching before denormalizing. If denormalization is the right call, I would use an async event-driven approach to keep the denormalized store updated, accept a consistency window of X seconds, and build a reconciliation job that runs nightly to catch drift."
Sample Questions
Your service's primary table needs a non-backward-compatible schema change. 40 consumers depend on it. Walk me through your approach.
This tests whether you can plan a staged migration without blowing up production. Interviewers want to hear about dual-write strategies, expand-contract patterns, and how you coordinate with dozens of consumer teams who all have different release schedules.
How do you design data contracts between teams to prevent breaking changes?
The interviewer is probing for schema registries, versioned contracts, and clear ownership boundaries. They want to see that you treat data interfaces with the same rigor as API interfaces.
When do you denormalize, and how do you manage the consistency tradeoff?
Read performance versus write complexity is the core tension here. Strong answers explain specific scenarios where denormalization is justified, and they are honest about the ongoing cost of keeping denormalized data consistent.
Evaluation Criteria
- Lays out a phased migration plan with concrete rollback points rather than a single cutover
- Thinks about backward compatibility as a default constraint, not an afterthought
- Describes data contracts with enough specificity to show they have actually implemented one
- Understands when normalization tradeoffs flip based on read/write ratios and team structure
- Addresses cross-team data dependencies as a coordination problem, not just a technical one
Key Points
- •Expand-contract is the only safe pattern for schema changes at scale. Add the new column, backfill, migrate consumers, then drop the old column. Skipping any step is how you get 3 AM pages.
- •Schema registries and versioned contracts turn implicit data dependencies into explicit ones. If teams consume your data through a registered schema with compatibility checks, breaking changes get caught at CI time instead of production time.
- •Denormalization is a loan against your future write complexity. Every denormalized field is a field you now maintain in two places. Write amplification, eventual consistency windows, and reconciliation jobs are the interest payments.
- •Data ownership boundaries should mirror team boundaries. When two teams co-own a table, neither team feels responsible for its schema health, and migration planning becomes a committee exercise.
- •Treat schema changes like API changes. Version them, test them against production-scale data, announce deprecation windows, and give consumers migration tooling.
Common Mistakes
- ✗Planning a 'big bang' migration over a weekend, assuming nothing will go wrong with 40 consumers reading from the same table during a rename
- ✗Assuming all consumers can update simultaneously when in reality some teams ship weekly and others are mid-sprint with a code freeze
- ✗Denormalizing prematurely because a query is 'kind of slow' without profiling the actual bottleneck or considering an index first
- ✗Not testing schema changes against real production data volumes, then discovering that a backfill takes 9 hours instead of the estimated 20 minutes