Schema Evolution Governance

The Breaking Change That Nobody Saw

A platform team adds a required region field to a UserUpdated Kafka event. They test their producer, everything serializes correctly, they deploy. Within 20 minutes, three downstream services start throwing deserialization errors. The analytics pipeline silently drops 40% of events because it uses lenient JSON parsing and the missing field causes null pointer exceptions deep in the transformation logic. The notification service crashes entirely. The fraud detection system keeps running but produces incorrect risk scores because it falls back to a default value for the missing field.

This is not a hypothetical. Variations of this story play out weekly at companies running event-driven architectures without schema governance. The root cause is never technical. It is organizational: no one enforced a review process for schema changes to shared contracts.

Protobuf vs Avro: An Honest Comparison

Protobuf dominates at Google, Square, and most gRPC-native shops. Field numbering makes evolution explicit. You never rename or reuse field numbers. Required fields were deprecated in proto3 precisely because they make evolution dangerous. Tooling is excellent: buf provides linting, breaking change detection, and a schema registry in one CLI. Performance is roughly 2-5x faster than Avro for serialization and deserialization.

Avro is the default in the Kafka ecosystem. Reader-writer schema resolution means a consumer can read data written with a different (compatible) schema version without any code changes. The schema travels with the data (or is looked up from a registry by schema ID). LinkedIn built Avro into their event infrastructure from the beginning, and it handles thousands of schema versions in production. The downside: Avro's dynamic typing means less compile-time safety, and tooling outside the JVM ecosystem is weaker.

If you are a Kafka-heavy shop with mostly JVM services, Avro with Confluent Schema Registry is the path of least resistance. If you are polyglot with gRPC or Connect, Protobuf with buf gives you stronger guarantees and better developer ergonomics.

Schema Registry and Compatibility Modes

Confluent Schema Registry (or its open-source alternatives like Karapace and Apicurio) enforces compatibility rules at write time. When a producer tries to register a new schema version, the registry checks it against previous versions.

Backward compatible: new schema can read data written with the old schema. Safe to deploy consumers first. Allows adding optional fields, removing fields with defaults.

Forward compatible: old schema can read data written with the new schema. Safe to deploy producers first. Allows removing optional fields, adding fields with defaults.

Full compatible: both directions. Deploy in any order. This is the strictest mode, and the one most teams should default to for shared topics.

Set compatibility mode per-topic, not globally. Internal service-to-service topics might tolerate BACKWARD. Topics consumed by external teams or data pipelines should enforce FULL.

Schema Governance That Actually Works

LinkedIn's schema review process requires any change to a shared event schema to go through a design review with representatives from consuming teams. Netflix takes a different approach: they auto-generate compatibility reports and block merges that violate the configured compatibility mode.

A practical governance setup for most organizations:

CI compatibility checks. Every PR that modifies a .proto or .avsc file runs against the schema registry's compatibility endpoint. Failures block the merge. buf breaking does this for Protobuf. Confluent's Maven plugin does it for Avro.
Schema change review. Changes to schemas consumed by more than one team require sign-off from at least one consuming team. This is a CODEOWNERS rule, not a process document.
Consumer-driven contract tests. Each consumer publishes a Pact contract describing the fields and formats it depends on. Producer CI runs these contracts before deploying. This catches semantic changes that syntactic compatibility checks miss.
Schema changelog. Maintain a CHANGELOG alongside your schema definitions. Version bumps, field additions, deprecations, and migration guides go here. Treat schemas like public APIs, because that is exactly what they are.

The Breaking Change That Nobody Saw

Protobuf vs Avro: An Honest Comparison

Schema Registry and Compatibility Modes

Backward compatible: new schema can read data written with the old schema. Safe to deploy consumers first. Allows adding optional fields, removing fields with defaults.

Forward compatible: old schema can read data written with the new schema. Safe to deploy producers first. Allows removing optional fields, adding fields with defaults.

Full compatible: both directions. Deploy in any order. This is the strictest mode, and the one most teams should default to for shared topics.

Set compatibility mode per-topic, not globally. Internal service-to-service topics might tolerate BACKWARD. Topics consumed by external teams or data pipelines should enforce FULL.

Schema Governance That Actually Works

A practical governance setup for most organizations:

CI compatibility checks. Every PR that modifies a .proto or .avsc file runs against the schema registry's compatibility endpoint. Failures block the merge. buf breaking does this for Protobuf. Confluent's Maven plugin does it for Avro.

Schema change review. Changes to schemas consumed by more than one team require sign-off from at least one consuming team. This is a CODEOWNERS rule, not a process document.

Consumer-driven contract tests. Each consumer publishes a Pact contract describing the fields and formats it depends on. Producer CI runs these contracts before deploying. This catches semantic changes that syntactic compatibility checks miss.

Schema changelog. Maintain a CHANGELOG alongside your schema definitions. Version bumps, field additions, deprecations, and migration guides go here. Treat schemas like public APIs, because that is exactly what they are.

Architecture Diagram

The Breaking Change That Nobody Saw

Protobuf vs Avro: An Honest Comparison

Schema Registry and Compatibility Modes

Schema Governance That Actually Works

Key Points

Common Mistakes

Related Topics

Schema Evolution Governance

Architecture Diagram

The Breaking Change That Nobody Saw

Protobuf vs Avro: An Honest Comparison

Schema Registry and Compatibility Modes

Schema Governance That Actually Works

Key Points

Common Mistakes

Related Topics