Feature Flag Architecture
Architecture Diagram
Types of Feature Flags
Not all flags are created equal, and treating them the same way leads to problems. Understanding the taxonomy helps you set the right lifecycle and governance for each.
Release flags decouple deployment from release. You merge code behind a flag, deploy it, and enable it when ready. These should be short-lived (days to weeks). Once the feature is stable, remove the flag and the conditional logic.
Experiment flags support A/B testing. They route users into cohorts and measure outcomes. These need statistical rigor: consistent assignment (same user always sees the same variant), proper sample sizes, and integration with an analytics pipeline. Experiment flags live for the duration of the test, typically 2-8 weeks.
Ops flags are circuit breakers and kill switches. They let you disable expensive features during incidents or degrade gracefully under load. These are long-lived by design. Netflix uses ops flags to disable recommendation algorithms during high-traffic events to preserve capacity for playback.
Permission flags gate access to premium or early-access features. These are essentially authorization rules stored as flags. They are permanent until the feature model changes.
Build vs Buy
The build-vs-buy decision for feature flags has a clear threshold. If you need fewer than 20 boolean flags with no user targeting, a database table with an in-memory cache works fine. A simple flags table with key, value, and description columns gets you started in an afternoon.
Beyond that, buy a solution. LaunchDarkly ($10-20 per seat per month) offers the most mature SDK ecosystem with local evaluation, percentage rollouts, and user segmentation. Unleash is the best open-source option with self-hosting. Flagsmith sits between the two with both cloud and self-hosted options.
The hidden cost of building your own is the targeting engine. "Enable this flag for users in the US who signed up after January 2024 and are on the enterprise plan" sounds simple but requires a rule evaluation engine, a user context pipeline, and consistent hashing for percentage rollouts.
SDK Design and Performance
Flag evaluation happens on every request, often multiple times. Performance matters. The two evaluation models differ dramatically.
Remote evaluation calls the flag service API on every check. Latency is 5-20ms per call. This adds up fast if you evaluate 10 flags per request. Only acceptable for server-side checks with very few flags.
Local evaluation downloads the full flag configuration to the SDK on startup, then evaluates locally in microseconds. The SDK maintains a persistent connection (SSE or WebSocket) to receive updates. LaunchDarkly, Unleash, and Flagsmith all support this model. This is the right choice for production workloads.
Flag Lifecycle Management
The biggest operational risk with feature flags is accumulation. Flags that were supposed to be temporary become permanent. Six months later, nobody knows if it is safe to remove them.
Enforce lifecycle policies. Every flag gets an owner and an expiration date. Run a weekly report showing flags past their expiration. Some teams add lint rules that flag conditional logic referencing expired flags. Piranha (Uber's open-source tool) automatically removes stale feature flags from code through automated refactoring.
Key Points
- •Feature flags come in four types: release flags (temporary, for deployment decoupling), experiment flags (A/B tests), ops flags (circuit breakers, kill switches), and permission flags (premium features)
- •Local SDK evaluation is 100x faster than remote evaluation. LaunchDarkly and Unleash both support local evaluation by streaming flag configurations to the SDK
- •Flag lifecycle management is critical. Every flag needs an owner, a creation date, and an expiration date. Flags without expiration become permanent tech debt
- •Progressive delivery uses feature flags to gradually expose changes: internal users, then beta users, then 1%, 5%, 25%, 100%. This catches issues at each stage with minimal blast radius
- •Build your own flag system only if you have fewer than 20 flags and no targeting requirements. Beyond that, buy a solution. The maintenance cost of a custom system grows faster than expected
Common Mistakes
- ✗Leaving release flags in the code for months after the feature is fully launched. This creates dead code paths that confuse new engineers and increase testing surface area
- ✗Nesting feature flags inside other feature flags. The combinatorial explosion of states makes testing impossible and reasoning about behavior unreliable
- ✗Evaluating flags on every function call instead of once at the request boundary. This adds latency and makes behavior inconsistent if a flag changes mid-request
- ✗Not logging flag evaluations. When debugging production issues, knowing which flag values a user received is essential for reproducing the problem