Schema Evolution
- Categories
- Architecture
Designing data encodings so formats can change over time without breaking running systems. Two directions matter: backward compatibility (new code can read data written by old code) and forward compatibility (old code can read data written by new code). Both are needed whenever old and new versions run at the same time.
Why it Matters
Data outlives the code that wrote it, and during any rollout multiple versions run simultaneously, so a format that cannot evolve forces lockstep deploys and risky migrations. Compatible encodings let code and data change independently.
Signals
- A deploy that requires updating all clients and servers at once.
- Reading old records crashes new code, or new records crash old code.
- Fear of changing a stored or wire format.
Benefits
Independent, rolling deploys; old data stays readable; safe, gradual migrations instead of big-bang cutovers.
Risks
Removing or repurposing a field that old readers still expect; adding a required field with no default and breaking forward compatibility; assuming all data was written by the current version of the code.
Tensions
Flexibility to evolve competes with strictness and compactness; tolerant readers ease change but can mask genuine errors in the data.
Examples
Adding an optional field that old readers ignore and new readers default; never reusing a removed field's identifier so old data stays interpretable.