Design a Tagging System
Design a multi-tenant tagging system where tenants independently manage tags and associate them with content. The system must prevent duplicate tags globally within a tenant, serve product discovery in under 25ms, and track tag usage analytics across millions of interactions. Consider consistency trade-offs: tag operations need strong consistency, discovery can tolerate caching, and analytics can be eventual.
Key Topics
Interview Cheat Sheet
60s skim · 3min careful readThree consistency tiers matched to three workloads. Tag CRUD strong (Spanner with deterministic UUID plus UNIQUE constraint, no global lock). Discovery cached (Redis to Elasticsearch to Spanner cascade, 95 percent hit at 2ms). Analytics eventual (Redis INCR plus GETSET-with-backup gives zero click loss without taxing user latency). Spanner is the coordination layer, not Kafka.
- Tag CRUD · strong · deterministic UUID + UNIQUE
Both Tokyo and London can create the same tag at the same time. Instead of locking, the tag_id is computed as a hash of the tenant and normalized name, so both regions independently arrive at the same UUID. Spanner's UNIQUE constraint resolves the race; whoever loses fetches the existing row.
User submits 'marketing'normalize (lowercase, trim)tag_id = hash(tenant_id + normalized_name)Spanner INSERT with UNIQUE(tenant_id, normalized_name)winner commits via TrueTime (~90ms global)loser hits UNIQUE violationfetch existing row by tag_idreturn same tag_id (~85ms)no Zookeeper, no two-phase commit, no distributed lock - Discovery · cached · Redis → ES → Spanner cascade
When someone asks 'show all content tagged react', most reads hit a Redis cache. Misses go to Elasticsearch for fuzzy and filtered lookups. ES indexing has up to 30s lag, so if a result looks suspiciously empty, the system falls back to Spanner directly rather than serving a stale empty list.
Discovery query (tenant_id, tag_id)Redis cache (key = tenant_id:tag_id:content_list, 5-30s TTL, 95% hit, ~2ms)cache miss: Elasticsearch (~18ms, tenant_id hard filter on every query)if ES result count <5 but Spanner expects more: fall back to Spanner direct (~40ms)populate caches - Click analytics · eventual · GETSET-with-backup
Click events flow into Redis counters in 2ms so user latency is never affected. Every 30 seconds, batched counts get flushed to Spanner. A backup pattern keeps clicks safe even if the processor crashes mid-flush.
Click eventRedis INCR counter:t:tag:region (~2ms p99, ~100K ops/sec ceiling per key)every 30s flush: 1) GETSET counter0 atomically, 2) SET backup:t:tag:region count, 3) INSERT Spanner, 4) DEL backup only on commit OKcrash between 1-4: backup survives, next tick INCRBYs it back2-min global aggregation across regions
- •10K tenants, 200 users each = 2M total / 200K concurrent (10% online), 3 regions (US/EU/Asia)
- •35 tag creates/sec, 700 item-tag mappings/sec, 9,300 view events/sec
- •Tag creation ~100ms global (Spanner TrueTime), losing region pays ~85ms
- •Discovery: ~2ms Redis (95% hit), ~18ms ES, ~40ms Spanner direct fallback
- •Click ingest: 2ms p99 Redis INCR, 30s batch flush, 2-min global aggregation
- •Redis INCR ceiling ~100K ops/sec on a single key, well above hot-tag rate
- •5M tags total, 500M content items, 1.5B association rows (~150GB junction)
- •Deterministic UUID from hash(tenant_id + normalized_name) + UNIQUE constraint (not distributed lock)
- •Spanner for multi-region writes (CockroachDB is the open-source escape hatch)
- •Redis to Elasticsearch to Spanner discovery cascade (not single store)
- •GETSET-with-backup pattern for zero click loss (not naive GET-then-RESET)
- •tenant_id on every row, cache key, ES doc, Kafka message (hard isolation)
- •Spanner is the coordination layer, not Kafka (no cross-region invalidation)
- •Three-tier consistency framing is the load-bearing architectural decision, surface in minute two
- •ES indexing lag (up to 30s) is the load-bearing risk, mitigated with Spanner fallback on suspiciously empty results
- •Deterministic UUID converges both regions to the same tag_id with no coordination protocol
- Three consistency tiers (the load-bearing architectural decision)
Not everything needs the same guarantees, and pretending it does is what makes tagging systems expensive. Tag CRUD: strong consistency (no duplicate 'JavaScript' / 'javascript' / ' JavaScript ' tags ever, enforced by UNIQUE on normalized name in Spanner). Discovery: cached with 5-30s TTL, stale during the window is acceptable (the alternative is hitting Spanner on every 'show all React content' query). Analytics: eventual, 30s batch flush, 2-min global aggregation. Surface this framing in minute two of the interview; everything else follows from it.
- Deterministic UUID over distributed lock
Two users in Tokyo and London type 'marketing' into the same tenant at the same millisecond. With a distributed lock you need Zookeeper or Spanner-based mutex, ~100ms coordination overhead per tag write. With deterministic UUID = hash(tenant_id + normalized_name), both regions independently compute the same UUID. Tokyo's INSERT wins via Spanner TrueTime; London hits UNIQUE(tenant_id, normalized_name), catches the exception, fetches the existing row, returns the same tag_id. ~85ms loser, no coordination protocol, no operational complexity. Same convergence-by-construction pattern as the URL shortener's region-prefixed counter.
- Spanner over Postgres or DynamoDB
PostgreSQL is single-master; multi-region writes require manual sharding or active-passive failover with a longer RTO. DynamoDB Global Tables are eventually consistent, which fails the no-duplicate-tags requirement (two regions could both 'win' the same name). Spanner gives multi-region strong-consistent writes via TrueTime with no operational overhead. CockroachDB is the open-source alternative if you can't run on GCP. The whole 'Spanner is the coordination layer, not Kafka' insight follows: cross-region cache coherence on tag creation isn't broadcast; London's next cache miss just queries Spanner directly.
- GETSET-with-backup for zero click loss
Naive GET-then-RESET on a Redis counter loses clicks if the processor crashes between the GET and the Spanner INSERT. The four-step pattern fixes it: 1) GETSET counter→0 atomically captures the current count and resets in one op, 2) SET backup:t:tag:region with the same count (durable in Redis under a different key), 3) INSERT to Spanner, 4) DEL backup only on commit OK. Crash anywhere between 1 and 4 leaves the backup intact; next tick INCRBYs it back and retries. Same effectively-once pattern as the job scheduler's fencing tokens: the durability marker is the source of truth across retries.
40-Minute Interview Playbook
Each phase is what the interviewer expects you to do and say. Concrete steps, not topic hints. The diagrams are what you sketch on the board.
- 15 min
Clarify Requirements and Scale
GoalEstablish that this is a three-tier consistency problem, not a single-store problem. Get the interviewer to agree that tag CRUD, discovery, and analytics each have different SLAs.
Do & Say- ASK·1ASK: How many tenants and what's the isolation requirement? Land on 10K tenants, 200 users each, 2M total. Write 10K tenants, 200K concurrent, 3 regions (US/EU/Asia) on the board.
- SAY·2Now the load-bearing question: Do tag operations, discovery queries, and view analytics need the same consistency? Three tiers: Tag CRUD needs strong consistency (no duplicate tags), Discovery can tolerate ~30s staleness (caching is fine), View counts are eventual, 2-minute global aggregation lag is acceptable.
- SAY·3Lock the latency targets: Tag creation ~100ms globally including the cross-region round-trip, Product discovery under 25ms p99, View-count ingest under 5ms (the click should never feel slow).
- SAY·4Pin the throughput: 35 tag creates/sec, 700 item-tag mappings/sec, 9,300 view events/sec. The analytics tier dominates. Write all three on the board.
- SAY·5Park out of scope. In scope: tag CRUD with global uniqueness, tag-to-content association, content discovery by tag, popular tags dashboard. Out: full-text search inside content, tag suggestions/autocomplete-ranking, recommendation models. Wait for the nod.
Interviewer is grading: You volunteer the three-consistency-tier framing in the first minute. You don't try to bolt strong consistency onto the click-counting path.
- 25 min
Data Model and Deduplication
GoalShow how deterministic UUIDs plus a UNIQUE constraint eliminate the cross-region race without a global lock. Sketch the schema with tenant_id everywhere.
Do & Say- DRAW·1Sketch the tags table columns: tag_id (STRING, deterministic UUID from hash(tenant_id + normalized_name)), tenant_id, normalized_name (lowercase, trimmed), display_name, created_region, status, timestamps. Constraint: UNIQUE(tenant_id, normalized_name).
- SAY·2Justify the deterministic UUID. Two users in Tokyo and London create the tag marketing at the same millisecond. Both compute the same UUID from hash(tenant_id + marketing). The first INSERT wins, the second hits the UNIQUE constraint, fetches the existing row, returns the same tag_id. No coordination protocol needed.
- DRAW·3Sketch item_tags columns: id, tenant_id, tag_id (FK), item_id (FK), created_region, UNIQUE(tenant_id, item_id, tag_id). Composite indexes: (tenant_id, tag_id) for product-discovery, (tenant_id, item_id) for tag-listing.
- DRAW·4Sketch the analytics tables: tag_increment_events for regional buffer flushes, tag_lifetime_stats keyed by (tenant_id, tag_id) with total_lifetime_views plus a regional_breakdown JSON blob.
- SAY·5Justify tenant_id everywhere. Every row, every cache key, every Kafka message carries tenant_id. Sharding is by tenant_id so a single tenant's data is co-located. Cross-tenant queries are physically impossible from the API layer.
Interviewer is grading: You name the deterministic UUID trick by what it accomplishes (eliminates global coordination), not just buzzword it. You mention the UNIQUE constraint as the final safety net, not the primary mechanism.
- 310 min
High-Level Design (three consistency tiers, three data paths)
GoalOne diagram with the three tiers visible: Spanner strong for tag CRUD, Valkey plus Elasticsearch for discovery, Redis-buffered Kafka pipeline for analytics. Label each path with its latency.
Draw on the boardDo & Say- DRAW·1DRAW Tier 1, Tokyo tag creation. Flow: API computes deterministic UUID, checks regional Redis (2ms), INSERTs Spanner with UNIQUE (90ms), on conflict, fetches existing tag, updates Redis, publishes Kafka for ES. Total: ~100ms globally via Spanner TrueTime, no coordination protocol.
- SAY·2Label the alternative kill. Why Spanner not PostgreSQL? PostgreSQL is single-master, multi-region writes require manual sharding or active-passive failover. CockroachDB would work as an open-source alternative. DynamoDB Global Tables are eventually consistent, which fails the no-duplicate-tags requirement.
- DRAW·3Draw Tier 2, London discovery. API hits regional Redis tag_items cache (~2ms). 95% of queries hit, return in 15ms. Miss falls to regional Elasticsearch (~18ms). ES unavailable falls back to Spanner direct (~40ms). Three-layer read fallback, three-layer latency budget.
- SAY·4Justify Elasticsearch. Pre-aggregated for popular tag-item combinations. Indexed asynchronously from Kafka events, up to 30s lag. Tenant_id is a hard filter on every query, no cross-tenant exposure. Cross-region replication every 2-3 minutes, which is fine because discovery tolerates that staleness.
- DRAW·5Draw Tier 3, Sydney click ingest. Flow: API does Redis INCR on live:tenant123:marketing:asia (~2ms), returns 200 immediately, Async publish to regional Kafka, Every 30s, batch processor does GETSET to atomically read-and-reset the counter, Writes a backup key, Flushes to Spanner tag_increment_events, Deletes backup only on successful commit.
- SAY·6Explain the global aggregator. Every 2 minutes, the US region (designated global aggregator) sums unprocessed regional events grouped by (tenant_id, tag_id), UPDATEs tag_lifetime_stats, marks events processed. No cross-region cache invalidation because each region rebuilds popular-tag cache on its own 5-minute TTL from Spanner.
Interviewer is grading: You separate the three tiers visually so the interviewer sees that strong consistency only touches Spanner, never the click path. You volunteer that CockroachDB is a viable open-source alternative without being asked about cloud lock-in.
- 415 min
Deep Dive: Zero Data Loss on Clicks, and the Race on Tag Creation
GoalTwo sub-dives. First, why the GETSET-with-backup pattern survives processor crashes. Second, why deterministic UUIDs plus UNIQUE constraint actually resolve the cross-region race, with numbers.
Draw on the boardDo & Say- SAY·1Pivot to click counting. The naive design loses clicks on processor crash. Sequence: INCR returns to the user, processor reads counter, crashes before writing to DB, restart reads 0, counts are gone.
- SAY·2GETSET-with-backup. Step 1 GETSET counter 0, atomic read+reset. Step 2 SET backup:t:tag:region count, durable in Redis. Step 3 INSERT Spanner. Step 4 DEL backup only on commit OK. Crash anywhere between 1-4, backup survives. Next tick INCRBYs it back and retries.
- SAY·3Redis loss. Redis down, we lose live counter and in-flight backups. Circuit breaker activates, click API writes directly to Kafka (~10ms instead of 2ms). Counter rebuilds from Kafka replay when Redis recovers. Clicks never drop, worst case is degraded latency.
- SAY·4Pivot to the create race. Two users on opposite continents type marketing into the same tenant at the same millisecond. Without coordination, you get two tags with two different IDs and the second one breaks every existing item_tag row that uses the first.
- SAY·5Deterministic UUID resolution. Both regions compute hash(tenant_id + normalized_name). Same input, same UUID anywhere. Tokyo INSERT wins in ~90ms via Spanner TrueTime. London arrives ~10ms later, hits UNIQUE, exception handler fetches existing tag by the same UUID. London pays ~85ms. No lock, no 2PC, no coordination.
- WATCH·6Be ready for the cache-coherence question. Tokyo's cache now has the new tag. London's cache doesn't, but the next London cache miss queries Spanner directly, finds the tag (Spanner replicates synchronously), populates London's local cache. No cross-region invalidation broadcasts needed. Spanner is the coordination layer, not Kafka.
- SAY·7Stampede protection. If a regional Redis dies, every tag lookup falls through to Spanner simultaneously. Circuit breaker caps concurrent Spanner queries per pod at 50. Pod-local in-memory LRU of 10K recently-seen tags serves as a fourth fallback layer.
- SAY·8Hot tag contention. Viral tag gets thousands of clicks/sec per region. Redis INCR on one key is single-threaded but handles ~100K ops/sec, well above any realistic hot-tag rate. 30s batch flush means global totals lag the regional view, fine for a dashboard that refreshes every 5 min.
Interviewer is grading: You walk the GETSET-backup-DEL sequence step by step rather than waving at 'atomic'. You explicitly say Spanner, not Kafka, is the coordination layer for tag creation. You volunteer the 100K ops/sec Redis ceiling as why hot-tag contention isn't a problem here.
- 55 min
Trade-offs, Risks, and Wrap-up
GoalName the three consistency-tier trade-offs explicitly, the Spanner lock-in trade-off, and close with one sentence that ties the design to the requirement.
Do & Say- SAY·1Tier trade-offs. Strong CRUD costs 100ms, accepted because creation is rare (35/sec) and duplicates break everything downstream. Eventual analytics buys 2ms click latency, dashboard tolerates 2-min lag. Cached discovery 15ms with 30s staleness, fine since new tags don't need millisecond visibility.
- SAY·2Spanner lock-in. Spanner gives us multi-region strong-consistent writes with no operational overhead, but it's GCP-only. CockroachDB is the open-source escape hatch. If we run on AWS, we'd accept the operational cost of CockroachDB or fall back to single-region Postgres plus active-passive with a longer failover window.
- SAY·3ES indexing lag. New tag-item associations take up to 30s to appear in search. Users who tag and immediately search may miss results. Mitigation: fall back to Spanner direct when ES results look suspiciously empty (under 5 when Spanner shows more). Slower path, but never stale.
- SAY·4Tenant isolation. Every row, cache key, Kafka message, and ES document carries tenant_id. Per-tenant rate limits prevent noisy neighbors. mTLS plus JWT (15-min TTL) on all internal calls. Cross-tenant access can't happen because tenant_id is baked into the constraint and the index.
- SAY·5Close in one breath: Three consistency tiers matched to three workloads, Deterministic UUIDs + Spanner UNIQUE eliminate the cross-region race with no coordination, Redis-to-ES-to-Spanner cascade keeps discovery at 15ms while durable, GETSET-with-backup gives zero click loss without taxing click latency. Biggest risk: ES indexing lag, mitigated by Spanner fallback on empty results.
- SAY·6If there's time, offer follow-ups: the global aggregator implementation, Spanner interleaved tables for tag-content co-location, or the per-tenant rate limit design.
Interviewer is grading: You match each consistency tier to its specific user-facing requirement, not just buzzword them. You name CockroachDB as the lock-in escape hatch unprompted.
Interview Grading by Level
What an interviewer at each level expects to see in your answer. Use this to calibrate, not to perform.
Mid-Level Engineer (L4 / SDE-II)
Builds a single-database tagging system that mostly works at one-region scale, misses the consistency-tier framing.
- Lists tags, items, and item_tags as three tables with foreign keys.
- Adds a unique constraint on (tenant_id, tag_name) when prompted about duplicate prevention.
- Picks Postgres or MySQL for the source of truth, adds Redis cache when prompted about discovery latency.
- Knows clicks need a counter and proposes Redis INCR for hot writes.
- Mentions sharding by tenant_id when asked about scale, though without naming the access pattern.
- Tries to use one consistency model for everything, usually strong consistency, and can't explain why click counting collapses under it.
- Misses the deterministic-UUID trick, proposes either a distributed lock or accepts the race silently.
- Loses clicks on processor crash because there's no backup-before-DB-write step.
- Doesn't bring up multi-region until pushed, defaults to single-region with read replicas.
- Forgets tenant_id on cache keys or Kafka messages, so cross-tenant data leaks are physically possible.
Senior Engineer (L5 / SDE-III)
Drives the three-tier consistency design end-to-end, picks Spanner with reasons, defends the deterministic-UUID race resolution.
- Establishes the three consistency tiers (CRUD strong, discovery cached, analytics eventual) in the first 5 minutes.
- Proposes deterministic UUIDs from hash(tenant_id + normalized_name) plus a UNIQUE constraint as the race resolution, can explain why no distributed lock is needed.
- Picks Spanner (or CockroachDB as open-source alternative) for multi-region writes, can give a one-line reason against PostgreSQL and DynamoDB.
- Builds the click pipeline with regional Redis INCR plus Kafka plus 30-second batch flush plus 2-minute global aggregation.
- Names the GETSET-with-backup pattern for zero data loss, can walk through the failure recovery.
- Falls back through Redis to Elasticsearch to Spanner for discovery, with explicit latency budgets at each tier.
- Carries tenant_id through every row, cache key, ES document, and Kafka message as the tenant isolation guarantee.
- Volunteers the ES-lag-versus-Spanner-fallback story only after being asked about staleness.
- Quantifies the Spanner write latency (~90ms) but not the constraint-violation re-fetch path (~85ms) on the losing region.
- Doesn't bring up CockroachDB as the open-source escape hatch from Spanner lock-in unless prompted.
Staff+ Engineer (L6+)
Owns the room, sequences the three tiers with clear SLA framing, surfaces the operational risks (ES lag, Redis-loss recovery, hot-tag contention) before being asked.
- Volunteers the three-tier framing as the load-bearing architectural decision in minute two: strong for CRUD, cached for discovery, eventual for analytics, matched to specific user-facing SLAs.
- Walks the GETSET-backup-DEL-on-success sequence step by step, with explicit failure semantics for crash between each step.
- Names Spanner as the coordination layer (not Kafka), explains why cross-region cache invalidation isn't needed when the database itself synchronously replicates.
- Quantifies the deterministic-UUID race resolution with numbers (~90ms winner, ~85ms loser via constraint violation plus re-fetch), defends it as cheaper than any coordination protocol.
- Defines explicit SLA tiers and uses them to defend each trade-off: tag creation at ~100ms is tier-1, discovery at 15ms is tier-2, click ingest at 2ms is tier-1 even though analytics is tier-3.
- Brings up CockroachDB as the open-source alternative to Spanner unprompted, frames it as the lock-in escape hatch rather than a primary choice.
- Closes with the one-breath summary tying deterministic UUIDs plus UNIQUE constraint plus GETSET-with-backup plus multi-tier discovery cascade to the three consistency tiers. Lists what they'd cover next: Spanner interleaved tables, per-tenant noisy-neighbor isolation, ES index rebuild from Spanner.
Common Follow-up Questions
click to expandQuestions an interviewer is likely to ask after your walkthrough. Rehearse the short answer.