Feature flags are a distributed systems problem

I built Pulse because every feature flag service I evaluated treated flag evaluation as a simple key-value lookup. "Is this flag on? Yes or no." In practice, the interesting question is never whether a flag is on. It's for whom, under what conditions, and how stale is the answer allowed to be.

Once you accept that framing, feature flags stop being a frontend concern and become a distributed systems problem. Here's what I learned building one from scratch.

The rule engine nobody thinks about

Most flag services support simple targeting: "enable for 10% of users" or "enable for users in the beta group." Pulse needed something more expressive because our tenants had complex rollout conditions: "enable for enterprise accounts in the US that have used the API in the last 30 days and are not on a trial plan."

Building this as a chain of if-else statements would work for five rules. It collapses at fifty. Instead, I wrote a recursive rule engine — a small AST evaluator that composes conditions using AND/OR/NOT operators, each leaf being a typed comparator (string equality, numeric range, set membership, regex match).

rule-engine.ts

type Rule =
  | { op: "and" | "or"; rules: Rule[] }
  | { op: "not"; rule: Rule }
  | { op: "eq" | "gt" | "in" | "re";
    field: string; value: unknown };

function evaluate(rule: Rule, ctx: Record<string, unknown>): boolean {
  switch (rule.op) {
    case "and": return rule.rules.every(r => evaluate(r, ctx));
    case "or":  return rule.rules.some(r => evaluate(r, ctx));
    case "not": return !evaluate(rule.rule, ctx);
    case "eq":  return ctx[rule.field] === rule.value;
    case "gt":  return (ctx[rule.field] as number) > (rule.value as number);
    case "in":  return (rule.value as unknown[]).includes(ctx[rule.field]);
    case "re":  return new RegExp(rule.value as string).test(ctx[rule.field] as string);
  }
}

The elegance of this approach is that new comparator types can be added without restructuring anything. The tree shape handles arbitrary depth, and because every node is pure — no side effects, no I/O — the entire evaluation is deterministic and trivially testable.

Stale flags are silent bugs

The subtle failure mode in feature flags is staleness. A client SDK caches the flag state locally (it has to — you can't make a network call on every render). But when a flag changes on the server, how fast does the client learn about it?

Polling is the common approach. The client asks "what are my flags?" every N seconds. It works, but it's wasteful when nothing changes and too slow when something does. If you're rolling back a broken feature, a 30-second polling interval means 30 seconds of users hitting the broken code path after you've already flipped the kill switch.

Pulse uses Server-Sent Events (SSE) to push flag changes to connected clients in real time. When a flag rule is updated on the server, every connected client receives the delta within milliseconds, not seconds.

"The gap between 'flag was changed' and 'all clients reflect the change' is where rollback failures live. SSE closes that gap from seconds to milliseconds without the complexity of WebSockets."

Why SSE over WebSockets

WebSockets are bidirectional. Feature flag evaluation is unidirectional — the server pushes, the client listens. SSE is a better fit because:

Automatic reconnection. SSE clients reconnect natively with Last-Event-ID tracking. WebSocket reconnection is manual and error-prone. In a flag system, dropped connections aren't acceptable — a disconnected client is a client evaluating stale flags.

HTTP infrastructure compatibility. SSE runs over standard HTTP. It works through CDNs, load balancers, and corporate proxies without special configuration. WebSocket upgrades fail silently in enough environments that you'll build an HTTP fallback anyway — at which point you've built SSE with extra steps.

Simpler server implementation. An SSE endpoint is a long-lived HTTP response that writes text frames. No protocol framing, no ping/pong keepalive dance, no binary message handling.

Split-brain rollouts

In a multi-instance deployment, flag updates need to propagate across all server instances. If instance A receives the update but instance B doesn't, clients connected to B are evaluating stale rules while clients on A have the new state. You've split your user base into two inconsistent populations without knowing it.

Pulse solves this with Redis Pub/Sub as the inter-instance broadcast layer. When any instance writes a flag update to Postgres, it publishes the change to a Redis channel. Every other instance subscribes and pushes the change to its connected SSE clients. The source of truth is always Postgres; Redis is the notification bus.

It's not a novel architecture. It's the same pattern behind every real-time collaboration tool. But I'm consistently surprised by how many feature flag services skip this and leave split-brain resolution as an exercise for the reader.

Percentage rollouts are harder than they look

"Roll this out to 10% of users" sounds trivial. It's not. The percentage has to be sticky — user A should consistently see the same flag state, not flip randomly between enabled and disabled on every evaluation.

The standard approach is to hash the user ID with the flag key and check if the hash falls in the target percentile range. This is deterministic, evenly distributed, and doesn't require storing per-user state. But it has a subtle property: when you change the percentage from 10% to 20%, the original 10% of users stay in the enabled group and the new 10% are added. Users never flip from enabled to disabled when the rollout increases.

Getting this wrong means users lose access to features they were already using, which is exactly the kind of bug that generates support tickets but doesn't crash anything.

The lesson

Feature flags look like configuration. They're actually distributed state. The flag service is the source of truth, the SDK cache is a replica, and the gap between them is a consistency window. Once you see it through that lens, the design decisions — SSE for low-latency propagation, Redis Pub/Sub for cross-instance broadcast, deterministic hashing for sticky percentages — follow naturally.

Don't build a feature flag service unless you're willing to think about it as infrastructure. If it's just a JSON file on S3, call it what it is and move on. But if you're doing progressive delivery in production, treat flags with the same rigor you'd give a database migration.