System Design: GitHub (200M Repos, Git Object Storage, Sparse Trigram Code Search, Per-Job VM CI)
Modeled, not leaked. This post models a GitHub-scale platform; it is not a leak of GitHub internals. Scale numbers are engineering estimates. Internal components (Spokes, Blackbird, Hydro, GLB, Azure runners) are paired with buildable OSS or managed-cloud substitutes. GitHub's publicly known architecture has historically centered on Rails; this post picks Go to model the shape.
Who this is for
- Interview candidates designing a Git platform
- Architects evaluating Gitaly + Praefect vs managed products
- Startup founders building a code-hosting product
- Infrastructure engineers benchmarking against GitHub-scale
Goal
A code hosting and collaboration platform at GitHub scale.
Design envelope (order of magnitude; precise ranges in §13):
- Hundreds of millions of repositories; tens of millions monthly-active developers.
- Around one billion Git operations per day; peak in the tens of thousands per second.
- Tens of millions of PRs monthly; tens of millions of code searches daily; millions of CI runs daily.
- Hundreds of petabytes of Git object storage across replicas.
- 99.95% availability; object durability from 3x replicas + object-store backup.
Features:
- Git over HTTPS and SSH with pack negotiation
- Fork-based collaboration with shared object storage
- Pull requests with merge-base diffs and inline review
- Branch protection, CODEOWNERS, org and team permissions
- Code search at scale via a trigram inverted index
- CI/CD on per-job VMs
- Webhook delivery with HMAC-signed payloads and retry
TL;DR
- Git objects are content-addressable via SHA-1 (hardened); 3 replicas in different failure domains per repo with majority-commit writes.
- Forks share the upstream's object pool through Git alternates; a fork of a 5GB repo costs a few MB.
- Metadata in MySQL 8 behind Vitess, functionally partitioned. Go monolith reads via sqlc-generated queries.
- Code search is a trigram inverted index. Zoekt for moderate scale; a custom engine only at very high QPS.
- CI jobs run on pre-warmed VMs with ephemeral OS reimage between jobs. K8s + Actions Runner Controller + Kata, or managed per-job sandboxes.
If building this today with 10K repos, stop here.
- PostgreSQL or MySQL (managed)
- Redis (managed)
- S3 / GCS / Azure Blob
- Zoekt or OpenSearch with ngram
- Managed CI runners (GitHub-hosted, CircleCI, Buildkite)
- Single region
- Monolith
Everything in this post beyond that is a response to pain that has not appeared yet. Upgrade only when it does.
5-minute summary
If reading only one section, read this.
Core idea. GitHub is five systems sharing identity and permissions:
- Git transport for clone and push.
- Core app for repos, PRs, issues, reviews, APIs.
- Code search with its own engine.
- CI compute fleet for running untrusted code.
- Event system for webhooks, notifications, and indexer fan-out.
What happens on common actions.
- Push → authenticate → receive-pack hook → majority-commit across 3 Git replicas → emit
push.v1→ enqueue background tasks (webhooks, notifications, CI). - Clone → replica coordinator picks a healthy replica →
git upload-packbuilds a pack → streams overside-band-64k. - Open PR → walk to merge-base → tree/blob diff → cache hunks in Redis keyed by
(merge_base, head)→ render. - Search code → extract trigrams → fan out to shards → intersect posting lists → verify on source → rank and merge.
- CI run → parse workflow YAML → expand matrix → scheduler queues jobs → allocator claims a warm VM → runner executes → logs stream to object store → VM reimages.
Why it scales. Different workloads scale differently, so the platform splits them. Git object data replicates separately from metadata. Reads fan out across replicas; writes serialize per repo. Async work leaves the push path fast. Search has its own engine. CI compute scales independently of everything else.
Where correctness lives.
- The receive-pack hook (branch protection, secret scan, pre-receive policy).
- The Git store's majority-commit quorum on refs.
- The Go authorization layer (permission check on every endpoint).
- Idempotent background jobs (retries cannot double-deliver).
Component roles (short).
- Edge. Global traffic routing + L7 proxying.
- Core. Go monolith for PRs, auth, issues, reviews, APIs.
- Git store. 3 replicas with majority-commit writes.
- Metadata. MySQL behind a sharding router, functionally partitioned.
- Redis. Hot cache, sessions, rate limits, job queue backing store.
- Event bus. Kafka with a schema registry.
- Code search. Trigram inverted index.
- CI runners. Per-job VM with ephemeral reimage.
Pick a path
| Time | Read | Covers |
|---|---|---|
| ~10 min | TL;DR + §3 + §4 | End-to-end flows + buildable substitutes |
| ~30 min | + §5, §14 | Stack tradeoffs, Git object model, fork-aware GC |
| ~60 min | Full post | Every deep dive plus scaling, failures, abuse, ops |
1. Why this is hard
GitHub looks simple on the surface: host repositories, show pull requests, run CI, search code. In reality each feature maps to a different systems problem with its own storage model, consistency rules, latency targets, and scaling bottlenecks.
Six hard problems shape the design.
1. Git object storage at internet scale. Hundreds of millions of repositories generate trillions of immutable Git objects. Efficient packing, replication, hot-repo balancing, and garbage collection are required.
2. Fork networks and shared storage. Many repositories are forks with minimal divergence. Full copies would waste massive storage. Forks need shared object pools while preserving permissions and deletion safety.
3. Pull request diff computation. A PR is based on merge-base comparison, not branch tips. Graph walks, tree diffs, rename detection, and patch generation must be fast.
4. Code review comment anchoring. Comments attached to specific lines must survive rebases, force-pushes, file renames, and deleted lines.
5. Search across global code volume. Near-instant substring, regex, and symbol search across hundreds of terabytes of source.
6. CI/CD execution of untrusted code. Bursty workloads, strong isolation, secret protection, compute cost control.
Operational constraints make it harder: pushes cannot be lost; permissions must always hold; clone, PR, search, and CI startup must feel fast; node or regional failures should not cause major outage; shared infrastructure must handle abuse and noisy neighbors.
Where systems usually break first
- CI burst queues. Dependabot waves, release cutovers, Monday-morning push storms drain the warm pool faster than it refills.
- Hot repos during release days. One repo absorbing hundreds of pushes/hour saturates a single Git replica set.
- Permission cache misses. A team-membership flip invalidates wide fan-outs; the thundering herd hits MySQL.
- Search indexing lag after mass pushes. A CI robot mass-rewriting files spikes the indexer for minutes.
- Webhook retry storms. A popular repo with thousands of hooks going to a single flaky target.
Each maps to a section below; the rest of the post is about keeping these five from turning into an incident.
2. Functional Requirements
| ID | Requirement | Priority |
|---|---|---|
| FR-01 | Clone, push, fetch Git repos over HTTPS and SSH | P0 |
| FR-02 | Fork with object sharing via alternates | P0 |
| FR-03 | Create and merge PRs with merge-base diffs | P0 |
| FR-04 | Inline review comments with auto-repositioning | P0 |
| FR-05 | Branch protection enforced at receive-pack | P0 |
| FR-06 | CODEOWNERS-based review assignment | P1 |
| FR-07 | Code search across public and authorized private repos | P0 |
| FR-08 | Actions CI/CD with workflow YAML in the repo | P0 |
| FR-09 | Webhook delivery with HMAC-signed payloads | P0 |
| FR-10 | Org management with teams, roles, permissions | P0 |
| FR-11 | Repo file browser with syntax highlighting | P0 |
| FR-12 | Issue + PR search (non-code) | P1 |
| FR-13 | Commit history and blame | P1 |
| FR-14 | Releases with binary artifacts | P1 |
| FR-15 | Markdown rendering cached in Redis | P1 |
| FR-16 | Notifications (email, push, in-app) | P1 |
| FR-17 | Dependabot-style dependency update PRs | P2 |
| FR-18 | Packages (container, npm, Maven) | P2 |
3. Non-Functional Requirements
| ID | Requirement | Target (modeled) |
|---|---|---|
| NFR-01 | Clone p50 / p99 on a typical 50MB-pack repo | <3s / <8s |
| NFR-02 | Push ACK after majority commit | <2s typical |
| NFR-03 | Repo page load | <500ms |
| NFR-04 | PR diff render (≤50 files) | <1s |
| NFR-05 | Code search p50 / p99 | <200ms / <1s |
| NFR-06 | Actions job dispatch-to-running | <30s |
| NFR-07 | Webhook delivery p50 / p99 | <5s / <30s |
| NFR-08 | Availability | 99.95% |
| NFR-09 | Git object durability | 3x replica + object-store backup |
| NFR-10 | Horizontal scalability | Linear for transport, search, CI |
| NFR-11 | Consistency | Strong for refs; eventual for search and CI status |
| NFR-12 | Actions concurrency peak | ~500K jobs |
| NFR-13 | RTO for stateless services | <5 min |
| NFR-14 | Fork storage cost | <1% of upstream |
| NFR-15 | Search index freshness | <2 min typical |
These are design budgets used to size capacity and set alert thresholds, not measured production p99s. Later sections that discuss actual query behavior (e.g. §9.3) use ranges rather than point targets. Latency targets are for a typical repo; large monorepos are a separate tail.
[3.1] Workload assumptions
- Median pack ~5MB. Long tail of monorepos >1GB.
- ~15 commits/repo/day median; hot monorepos see hundreds of pushes/hour.
- ~60M forks in ~1M networks, average size ~50, long tail into tens of thousands.
- Peak/average ~3x on Git ops, driven by working-hours overlap across US timezones.
- Webhook-delivery-to-push ratio ~5:1.
4. Technology Choices and What to Build When
Most of the choices below have an OSS path, a managed path, and (only at the top end) a custom path. Pick by ops budget and existing cloud footprint. Graduate only when scale forces it.
[4.1] What I would build in 2026
Startup. Managed CI runners + managed Postgres or MySQL + managed Redis + object store (S3/GCS/Blob) + Zoekt or managed OpenSearch. One region. One monolith. Under ~10K repos this is the whole story.
Growth. Add a sharding layer in front of MySQL (Vitess / read replicas), split job queues from cache Redis, run Zoekt as its own cluster, introduce an event bus when a second consumer of push events appears.
Large scale. Dedicated Git fleet (Gitaly + Praefect, or a custom engine), a CI scheduler with pre-warmer and fair queueing, a custom search engine (sparse trigrams) only when code volume crosses tens of TB and regex QPS sustains hundreds per second.
[4.2] Internal → buildable mapping
| Layer | Internal | Buildable (OSS) | Managed cloud | Consider custom when |
|---|---|---|---|---|
| Git storage | 3-replica, majority commit | Gitaly + Praefect | No 1:1 substrate; hosted products exist (CodeCommit, Azure Repos, Cloud Source Repositories) | Praefect cluster saturation; placement controls outgrow primitives |
| Event pipeline | Kafka + schema registry | Kafka + Confluent SR (community); Redpanda | MSK, Event Hubs, Confluent Cloud | Shared SDK is the bottleneck, not the broker |
| Code search | Sparse-trigram, ingest-sharded | Zoekt (full trigram); OpenSearch ngram | Sourcegraph Cloud, Elastic Cloud, managed OpenSearch, Azure AI Search | Tens of TB + sustained high-regex QPS |
| CI runner isolation | VM + ephemeral OS reimage | K8s + ARC (container); + Kata (VM-grade); Firecracker via flintlock | Fargate, ACI, Cloud Run Jobs; GitHub-hosted runners | Sustained >50K concurrent jobs |
| Edge L4 | Custom forwarding table | Katran (eBPF); IPVS + keepalived | NLB, Azure Standard LB, GCP TCP LB, Cloudflare Spectrum | pps/host >1M |
| Edge L7 | HAProxy | HAProxy, Envoy | ALB, Azure Front Door, GCP HTTPS LB, Cloudflare, Fastly | Rarely |
| Object storage | Blob | MinIO, Ceph RGW, SeaweedFS | S3, Azure Blob, GCS, R2, B2 | Egress or lock-in pressure |
| Secrets | Key Vault | Vault (community), Infisical | Secrets Manager, Key Vault, Secret Manager | Specific compliance |
| Relational | MySQL + sharding router | Same (OSS) | Aurora MySQL, Azure MySQL, Cloud SQL, PlanetScale | Multi-billion-row custom sharding |
| Cache | Redis | Same | ElastiCache, Azure Cache for Redis, Memorystore, Upstash | Memcached as second tier above ~TB |
| Background jobs | Redis-backed queue | Asynq, Sidekiq, Oban | Substrate above; or SQS / Cloud Tasks (simpler) | Library ergonomics block the team |
[4.3] Monolith language: Ruby, Go, or something else
| Constraint | Choose |
|---|---|
| Small team, fastest time-to-product | Rails |
| Throughput, language coherence with infra services | Go (this post) |
| Real-time-heavy workload (live logs, pub/sub UI) | Elixir / Phoenix |
| Performance-critical path inside a larger system | Rust (hot path only) |
- Rails. Historically GitHub's stack. Best for 2–20 engineers chasing product-market fit. Per-request throughput costs engineering at scale (Trilogy, gh-ost, YJIT).
- Go (this post). Single-binary deploys, native concurrency, sqlc gives compile-time SQL safety without ORM drag. More LOC per CRUD endpoint; easier infra hiring.
- Elixir / Phoenix. BEAM's actor model beats ActionCable for live logs and presence. Smaller talent pool, narrower third-party ecosystem.
This post uses Go. Every other decision is language-neutral.
Why a monolith at all? Collaboration hot paths (authz, PR page, review insert, webhook enqueue) read 3–10 tables from MySQL and 2–5 Redis keys. In-process chains keep those reads coherent and transactional; a microservice split introduces distributed-transaction problems for no product win. The pieces that should be separate services already are: Git Service (long-lived streaming protocol), search (stateful shards with its own coordinator), CI runner allocator (scheduler with its own lifecycle).
Why MySQL + a sharding router (not Postgres)? At single-node scale the choice is a wash. The gap opens at horizontal sharding: the MySQL sharding layer has a single mature, production-proven path (used by YouTube, Slack, Square, PlanetScale). Postgres's cross-shard story is less standardized (Citus, Patroni, pg_shard). At this scale, routing-layer maturity matters more than SQL engine nuance.
[4.4] When OSS or managed is enough forever
Most teams never need a custom engine. Signals that OSS or managed is still the right answer:
- Repo count below a Praefect cluster's write-throughput ceiling.
- Code index in the multi-TB to tens-of-TB range.
- CI concurrent jobs under 10K.
- pps/host below 1M.
- Team size under 50 engineers.
Graduate to custom only when the pain is measurable and repeatable. That move costs a multi-year rewrite and a dedicated team.
[4.5] Rejected alternatives
- Postgres + Citus: strong, but diverges operationally from the MySQL sharding path; pick one primary stack.
- Cassandra / KV: no joins; PR + review workflows need them.
- GORM, ent: runtime reflection hides query plans; sqlc's compile-time codegen is safer.
- Memcached as a second cache tier: defensible only above ~TB cache footprint.
- Docker-only CI isolation: shared kernel is weak for untrusted workflow code.
- Env files / flat SSM for secrets: no audit, no rotation primitives.
- Cloudflare/Fastly as the real edge at GitHub scale: for OSS builders, HAProxy + Katran or a managed NLB covers it.
Takeaway. Every internal component has a buildable OSS or managed substitute. Graduate only when scale forces it.
The substrate decisions are settled. The next sections walk the stack end-to-end, then deep-dive each subsystem.
5. End-to-End Architecture
Warm tan = source of truth. Green = derived / replayable. Blue = stateless compute. The same canvas shows all six flows below; each §5.X zooms into one traversal through it.
Layer map. One pass top-to-bottom before the flows drill in.
| # | Layer | Role |
|---|---|---|
| 1 | Client | Git CLI, browser, webhook sender |
| 2 | Edge | TLS, L4/L7 routing, DDoS absorption |
| 3 | Core app (Go monolith) | Auth, repo pages, PR creation, permissions |
| 4 | Git Service | Speaks Git wire protocol |
| 5 | Git storage (3 replicas) | Blobs, trees, commits, tags, refs |
| 6 | MySQL + sharding router | Users, repos, PRs, comments, workflow runs |
| 7 | Redis | Diff cache, sessions, permission cache, rate limits, queue backing |
| 8 | Background jobs | Per-target tasks with retry + DLQ |
| 9 | Event bus | Durable events consumed by many independent subscribers |
| 10 | Code search | Sparse-trigram inverted index |
| 11 | PR + diff system | Merge-base walk, tree/blob diff, comment reanchoring |
| 12 | CI / Actions | Parse YAML, expand matrix, schedule |
| 13 | VM pool | Pre-warmed ephemeral VMs |
| 14 | Object storage | CI logs, artifacts, release binaries, pack snapshots |
Six flows follow; each flow is one traversal through this stack: push, clone/fetch, fan-out from a push, PR lifecycle, code search, Actions dispatch.
[5.1] Flow 1: Push (write path)
Steps:
- Authenticate (SSH key or HTTPS PAT/OAuth).
- Resolve
(owner, repo)in MySQL via the sharding router; read the replica route. - Core authz checks push permission and branch protection.
- Pre-receive hook runs against the pack: secret scan, size limits, ref-update shape.
- Stream pack to all 3 replicas; wait for 2-of-3 majority ACK.
- Refs commit on majority; the third replica catches up async.
- Post-receive publishes a push event to the bus and enqueues background tasks.
- ACK to client.
The receive-pack hook is the security boundary. A raw git push bypasses the UI; the hook has the final word on branch protection.
[5.2] Flow 2: Clone / fetch (read path)
- Client requests
info/refs; server advertises refs + capabilities. - Client sends wants/haves — the object IDs it wants from the server, plus the IDs it already has locally so the server knows what to skip.
- Server computes the minimum pack with reachability bitmaps — a precomputed compressed index that records which objects are reachable from each ref, letting the server answer "what objects does this client need?" in milliseconds instead of walking the commit graph.
- Pack streams over
side-band-64k.
Reads fan out across any healthy replica, weighted by load. Pack files are memory-mapped; warm caches hit page cache at SSD speed.
[5.3] Flow 3: Async responsibility split
Bus = one event to many systems. Queue = one task to one worker. A push creates both shapes at once: one push.v1 broadcast to search / feed / analytics / audit, plus 12 webhook retries + 3 CI triggers + 400 notification jobs.
| Event bus | Job queue | |
|---|---|---|
| Shape | One event → many consumers | One task → one worker |
| Owns | Push, PR opened, issue closed, review submitted | Webhook delivery, notifications, CI trigger, exports, email |
| Durability | RF=3, retention = replay window | At-least-once with Redis AOF |
| Ordering | Per-repo partition | Per-queue FIFO |
| Retries | Consumer offset replay | Per-task exponential backoff, DLQ |
When one is enough.
- Queue only. Most teams should start here. Low throughput, no replay, limited fan-out.
- Bus only. Team already runs bus workers with retry and DLQ patterns; not worth a second substrate.
- Both. Workload shapes diverge: search-scale ingest, webhook-scale retries, analytics-scale replay. The trigger is workload divergence, not headcount.
[5.4] Flow 4: Pull request lifecycle
[5.5] Flow 5: Code search
Parse query into pattern + filters. Extract trigrams, intersect against a sparse-trigram posting index, fan out to shards, verify candidates on source.
[5.6] Flow 6: Actions dispatch
Parser expands matrix and builds the DAG. Scheduler enforces concurrency + dependency edges. Allocator claims a pre-warmed VM. Runner executes and streams logs. On exit, the ephemeral OS disk is reimaged and the VM returns to the pool.
[5.7] Trace a push (warm path)
Modeled ranges, not precise timings:
| Stage | Budget |
|---|---|
| Auth (SSH or HTTPS) | <50ms |
| Route lookup + authz cache hit | <50ms |
| Pack stream + pre-receive scan | sub-second on typical packs |
| Majority-commit (2-of-3 replicas) | <500ms typical |
| Post-receive event publish + task enqueue | tens of ms |
| Async fan-out (webhooks, notifications) | <1s after ACK |
| First CI job reaches a warm VM | ~1–30s depending on pool fullness |
| End-to-end push → searchable | <2 min typical |
Push-to-first-job-running is the user-visible SLO: <30s. Search freshness is the indexer SLO: <2 min.
[5.8] One git push: what each subsystem does
§5.7 showed the push timeline. This table shows the same push by subsystem: what each component handled synchronously vs asynchronously. This is an illustrative buildable design, not a claim about GitHub's exact internals.
| Subsystem | Hot path role | Async follow-up |
|---|---|---|
| MySQL | Reads repo metadata, permissions, and branch rules. | Workers later write delivery logs, audit rows, and notification records. |
| Redis | Checks cached permissions, rate limits, enqueues jobs, invalidates stale PR diff caches. | Holds transient queue state while workers drain jobs. |
| Git store | Receives the push packfile on the write leader. Resolves thin-pack dependencies. Commits the ref update after quorum replication. | Lagging replicas catch up; periodic snapshots go to object storage. |
| Event bus | Publishes one push event. | Consumers update search, feeds, analytics, and audit logs. |
| Workers | Not on the synchronous push ACK path. | Deliver webhooks, notifications, indexing triggers, and cleanup. Failed deliveries retry with exponential backoff and eventually move to a dead-letter queue. |
| CI orchestrator + VM pool | Not on the synchronous push ACK path. | Reads workflow files from the pushed commit, expands matrix jobs, enqueues jobs, allocates warm runners, executes workflows. |
| Object storage | Not required for push acknowledgment. | Stores CI logs, artifacts, reports, release archives, snapshots. |
Three sources of truth (MySQL, Git store, object storage); everything else is derived. Lose the search index and rebuild from bus replay plus the Git store; lose Redis and rebuild from MySQL; lose MySQL or the Git store and accept real data loss.
Only Git-store quorum and required policy checks gate push latency. Everything else is decoupled.
[5.9] Data ownership
Any derived store rebuilds from source-of-truth + bus replay. Cache loss degrades performance; source-of-truth loss degrades durability.
[5.10] What the design intentionally avoids
- Sub-100ms Git ops across the globe. Pack serving is throughput-oriented.
- Synchronous cross-region ref replication. DR is async; RPO <5 min.
- BM25 on code.
- Container-only runner isolation for untrusted code.
- In-repo secret storage.
Takeaway. Six flows on one stack: push, clone, fan-out, PR, search, CI. Each traverses the same stores but stresses different layers.
6. Git Object Storage
Already familiar with Git object types, pack files, and reachability? Skip to §6.5 Replication.
[6.1] Four object types
| Object | Hash input | Size |
|---|---|---|
| Blob | blob <size>\0<content> | 1KB–10MB |
| Tree | tree <size>\0<entries> | 200B–50KB |
| Commit | commit <size>\0<data> | 200B–2KB |
| Tag | tag <size>\0<data> | 200B–2KB |
SHA-1 (hardened via SHA-1-DC). Upstream Git supports SHA-256 as an opt-in format. Public GitHub currently operates on SHA-1 repos; a SHA-256 migration remains non-trivial because ecosystem tooling (libgit2, search indexes, commit-graph files, reachability bitmaps) widely assumes SHA-1 byte widths.
[6.2] Object graph
A commit changing only server.go creates one new blob, one new tree per modified directory, and one new commit. Unchanged blobs are referenced, not copied.
[6.3] Pack files + delta compression
Pack layout:
+-------------------+
| Header | magic, version, object count
+-------------------+
| Object (full) | type, size, zlib-compressed
+-------------------+
| Object (delta) | base + copy/insert ops
+-------------------+
| ... |
+-------------------+
| SHA-1 checksum |
+-------------------+
Delta compression picks pairs of similar objects and stores one as base, the other as copy/insert ops. A 100KB file with a 500-byte change deltas to ~600 bytes.
[6.4] Pack stats (modeled)
| Metric | Value |
|---|---|
| Avg pack size post-delta | ~50MB |
| Compression ratio | 60–80% |
| Median pack | ~5MB |
| p99 pack | ~2GB |
| Largest packs | 50GB+ |
| Packs per repo post-repack | 1–3 |
| Delta chain depth cap | 50 |
| Repack cadence | hourly incremental, weekly full |
Repack policy:
- Push-time: thin pack accepted, indexed, stored alongside existing packs.
- Incremental hourly: merge small packs if >10.
- Full weekly: single pack with aggressive delta search.
- GC: remove unreachable, fork-aware.
func scheduleRepack(repo *Repository) RepackType {
s := repo.PackStats()
switch {
case s.LooseObjectCount > 1000: return RepackIncremental
case s.PackFileCount > 10: return RepackIncremental
case s.LargestPackAge > 7*24*time.Hour: return RepackFull
case s.TotalSize > 1*GB && s.PackFileCount > 3: return RepackFull
}
return RepackNone
}[6.5] Replication
3 replicas per repo across different failure domains. Writes: stream to all 3; 2-of-3 majority required to ACK. Reads: any healthy replica, weighted by load. Rebuild: git bundle from a healthy peer + ref catch-up.
Write protocol (what "majority commit" means here). Objects and refs are different. Objects (blobs, trees, commits) can arrive at each replica out of order. The consensus step is the ref update: a commit is "committed" only when the ref for its branch (refs/heads/main → C11) points at it on a majority of replicas. Loose objects without a committed ref get GC'd later.
Pack bytes land on the primary; the replica coordinator tracks primary designation per-repo in its own metadata DB. For ref updates, the coordinator opens a reference transaction: primary and each secondary compute a hash of the proposed ref update and vote. If a majority agrees on the same hash in the same round, the transaction commits on agreeing replicas; divergent votes abort cleanly rather than half-writing. Primary designation survives restarts. On primary failure, the coordinator promotes a secondary after verifying replication parity.
Why three replicas, not five. Three balances latency, cost, and durability. Five tolerates two simultaneous losses but doubles storage and slows writes. One or two fails under correlated loss (rack power, network partition, firmware disk corruption). Three survives any single failure domain with majority-commit reachable; pack-to-object-store backup covers the 2-of-3 tail.
[6.6] Failure example: one replica dies during push
Scenario: push arrives while replica B loses its NIC.
- Push streamed to A, B, C.
- B fails mid-write.
- A and C finish, vote matching ref hashes, form majority.
- Transaction commits on A and C.
- Client gets ACK.
- B is marked divergent; replica coordinator queues catch-up once B returns, or provisions a replacement from a healthy peer via
git bundle.
No split-brain: exactly one primary per repo, designated in coordinator state; a partitioned coordinator fails closed rather than promoting. No data loss: the reference transaction refuses to commit without majority agreement.
[6.7] Git storage path: Gitaly, Praefect, and packfiles
Three moving parts:
- Gitaly: Git RPC service running on each repo node. Executes
git-upload-pack,git-receive-pack, ref lookups, and pack generation against the local on-disk repo. - Praefect: cluster router and coordinator. Selects replicas for reads, routes writes to the repository write leader, and drives the replicated ref update.
- Packfiles: compressed indexed bundles of Git objects. Streamed during clone, fetch, and push so the platform does not read or write millions of loose objects.
Clone / fetch:
Client ──▶ Praefect ──▶ replica (Gitaly) ──▶ packfile ──▶ Client
Push:
Client ──▶ Praefect ──▶ write leader (Gitaly) ──┬──▶ secondary replicas
└──▶ ref update vote
Read path (git clone / git fetch)
- Client connects to Git Service over SSH or smart HTTP.
- Praefect selects a healthy replica with current refs, weighted by load.
- Gitaly on that replica runs
git-upload-pack. - Gitaly builds a packfile: minimal object set, reachability-bitmap assisted.
- Packfile streams to the client over
side-band-64k. - Client stores it under
.git/objects/pack/.
Write path (git push)
- Client sends a thin pack (new objects plus deltas against what it claims the server has).
- Praefect routes the push to the repository write leader.
- Gitaly on the leader runs
git-receive-pack: pre-receive hooks run, deltas resolve, objects write. - Leader and secondaries compute the proposed ref update and vote (§6.5).
- Praefect commits the ref update once a write quorum agrees.
- ACK returns; lagging replicas catch up asynchronously.
Why packfiles matter. Without packfiles, a large repo becomes millions of tiny files. Packfiles collapse that into a small number of indexed files, giving:
- Bundled objects in a few indexed files after maintenance, not per-object files.
- Sequential I/O during clone and fetch.
- Delta compression between similar objects.
- Fewer syscalls and far less inode pressure.
- Cheaper replication and network transfer.
Repack runs off the hot path (§6.4).
Concrete example. A 50GB monorepo with ~8M objects. Without packfiles, git clone would open, read, and checksum 8M loose files. With packfiles and a reachability bitmap, Gitaly streams a handful of pack bytes sequentially; the client reconstructs the object graph from the pack index. This is the difference between a clone that completes and a clone that times out.
Hot repos (hundreds of pushes/hour on one monorepo) stress this path; see §15.2.
Takeaway. Objects can replicate in bulk. Correctness hinges on safely advancing refs like
mainfrom old commit to new commit. That is the one step that must be coordinated.
7. Pull Requests and Diffs
Common mistake. Diffing the PR head vs the target branch tip. That shows unrelated commits the moment the target advances. Always diff against the merge-base.
[7.1] Merge base
Diff B to D, not F to D.
Bidirectional BFS:
func FindMergeBase(ctx context.Context, repo *Repo, head, base string) (string, error) {
ha, ba := map[string]struct{}{}, map[string]struct{}{}
hq, bq := list.New(), list.New()
hq.PushBack(head)
bq.PushBack(base)
for hq.Len() > 0 || bq.Len() > 0 {
if hq.Len() > 0 {
cur := hq.Remove(hq.Front()).(string)
if _, ok := ba[cur]; ok { return cur, nil }
ha[cur] = struct{}{}
c, err := repo.GetCommit(ctx, cur)
if err != nil { return "", err }
for _, p := range c.Parents {
if _, seen := ha[p]; !seen { hq.PushBack(p) }
}
}
if bq.Len() > 0 {
cur := bq.Remove(bq.Front()).(string)
if _, ok := ha[cur]; ok { return cur, nil }
ba[cur] = struct{}{}
c, err := repo.GetCommit(ctx, cur)
if err != nil { return "", err }
for _, p := range c.Parents {
if _, seen := ba[p]; !seen { bq.PushBack(p) }
}
}
}
return "", nil
}Three optimizations keep it fast on million-commit repos: commit-graph file, generation numbers, and caching pull_requests.merge_base_sha.
[7.2] Tree diff
Git tree objects list entries (blobs and subtrees) sorted by path, so the diff is a linear merge walk:
- Advance two cursors through the sorted entries of
tree(merge_base)andtree(head). - Paths match, SHAs match → unchanged, skip.
- Paths match, SHAs differ → modified; recurse into subtrees or queue a blob diff.
- Path on one side only → added or deleted.
- Rename detection runs as a post-pass: pair unmatched deletes and adds by blob similarity above a threshold.
Per-file line diff on the matched blobs uses Myers. The tree walk is O(entries); Myers dominates only on very large modified files.
[7.3] Diff caching
| Layer | Key | TTL | Invalidation |
|---|---|---|---|
| Redis L1 | diff:{repo}:{merge_base}:{head} | 24h | on head push |
| Redis L2 | file_diff:{old_blob}:{new_blob} | 7d | never (content-addressable) |
| MySQL | pull_requests.diff_cache_key | permanent | per sync |
L2 does most of the work. The same blob pair appearing in many PRs and forks is diffed once and served from Redis thereafter.
Takeaway. Diff against the merge-base (not the target tip), cache by immutable blob SHAs, recompute on head move.
8. Code Review: Comment Anchoring
Comments follow code across force-pushes.
[8.1] Position model
Immutable: original_commit_sha, path, original_line, side, diff_hunk
Current: commit_sha, line_no (updated on each push)
[8.2] Repositioning
On head move, for each affected file:
- Compute diff old-head → new-head for that file.
- Build a line map from old lines to new lines by walking hunks.
- For each comment on that file: if the old line maps to a new line, update
line_noandcommit_sha. If the line is gone, markoutdated.
func RepositionComments(ctx context.Context, q *queries.Queries,
diffSvc DiffService, prID int64, oldHead, newHead string) error {
comments, _ := q.ListReviewComments(ctx, prID)
byFile := groupByPath(comments)
fds, _ := diffSvc.DiffCommits(ctx, oldHead, newHead)
for _, fd := range fds {
cs, ok := byFile[fd.Path]
if !ok { continue }
lm := buildLineMapping(fd.Hunks)
applyMapping(ctx, q, cs, lm, newHead)
}
return nil
}buildLineMapping walks the hunks once, tracking a running line offset: context lines pass through, inserted lines bump the offset, deleted lines drop the old line from the map. Pseudo-steps:
- Start offset = 0.
- For each old line 1..N: find the hunk containing it; if the line is
deleted, drop it; ifcontext, mapold → old + offset; update offset from insertions/deletions encountered.
[8.3] Force-push and outdated comments
Force-push rewrites history. Run repositioning; comments whose anchor line is gone become outdated. Outdated comments stay visible, collapsed, expandable.
Takeaway. A comment is anchored to an immutable (commit, path, line); head moves trigger a hunk-walk reposition, and deleted lines become outdated.
9. Code Search
Decision. If <1M files to search, OpenSearch with an
ngramanalyzer covers it. Do not build a trigram engine.
Already familiar with trigram indexes and posting-list intersection? Skip to §9.3 Query performance.
Common mistake. Indexing code with Elasticsearch BM25. BM25 tokenizes text and strips punctuation, so substring and regex at scale stop working. Code search needs a byte-level index, not a word-level one.
Terminology (short).
- ngram: any N-character substring.
- trigram: N=3. Standard for code search; 3 balances selectivity against index size.
- Full vs sparse trigrams: full indexes every trigram; sparse prunes common trigrams so posting lists stay selective without inflating index size.
How trigrams find a substring. Index every 3-char substring of every file into an inverted index. Each trigram maps to a posting list of (repo_id, file_id, offset). No tokenization, no case folding, no punctuation stripping. (, {, ->, :: preserved. At query time: extract trigrams, intersect posting lists, verify each candidate against source to drop false positives. Regex works by extracting trigrams from the literal prefix, intersecting to narrow, then running the regex engine only on that small set.
[9.1] Why not Elasticsearch only?
Elasticsearch is excellent for issues, PRs, users, and text metadata. Run it there.
Code search needs what ES does badly:
- Substring across punctuation (
handleWebhook(). - Punctuation-sensitive matches (
->,::,**kwargs). - Regex at scale without scanning every file.
BM25 asks "which documents best match these words by frequency?" (right for prose). Trigram indexes ask "which files literally contain this byte pattern?" (right for code). ES's ngram analyzer narrows the gap but inflates index 2–4× and still ranks BM25-style.
That is why trigram engines exist. Zoekt is the OSS default; a custom sparse-trigram engine only earns its weight at very high scale.
[9.2] Indexing
Re-index only changed files on push:
func HandlePush(ctx context.Context, ev PushEvent, git GitService, idx SearchIndex) error {
if ev.Before == NullSHA {
files, _ := git.ListTree(ctx, ev.RepoID, ev.After, true)
for _, f := range files {
if !isIndexable(f) { continue }
content, _ := git.ReadBlob(ctx, ev.RepoID, f.SHA)
_ = idx.IndexFile(ctx, ev.RepoID, f.Path, content, ev.After)
}
return nil
}
changed, _ := git.DiffTree(ctx, ev.RepoID, ev.Before, ev.After)
for _, f := range changed {
if f.Status == "deleted" { _ = idx.RemoveFile(ctx, ev.RepoID, f.Path); continue }
content, _ := git.ReadBlob(ctx, ev.RepoID, f.NewSHA)
if isIndexableFile(f.Path, content) { _ = idx.IndexFile(ctx, ev.RepoID, f.Path, content, ev.After) }
}
return nil
}[9.3] Query performance (modeled)
Ranges, not precise p99:
- Exact string: tens to low hundreds of ms.
- Very short string: hundreds of ms to seconds (heavy post-filter).
- Regex: low hundreds of ms on a narrowed candidate set.
- Language / org / path filter: tens to low hundreds of ms.
[9.4] Ranking, private-repo auth, and freshness
Ranking signals. Trigram intersection produces candidates; the ranker reorders them. Common signals:
- Path location (
src/beatsvendor/,node_modules/,test/fixtures/). - Repo popularity (stars, forks, recent activity).
- Match density.
- Symbol kind (definition beats reference).
- User scope (own org + followed repos).
- Freshness tie-breaker.
Private-repo auth filtering. "Search everything, then filter top-K" leaks private-repo existence. Two correct strategies:
- Prefilter by repo-read set. For users with access to a bounded number of repos, intersect the query against only those shards.
- Per-shard authz bitmap. For users in large orgs, carry a compact repo-read bitmap into the shard query; the shard masks posting-list hits before ranking.
Both require the coordinator to push an authz context into every shard request. The repo-read set comes from the Redis permission cache, never a per-query MySQL round-trip.
Index freshness pipeline. End-to-end push ACK → searchable: <2 min target. Per-hop: bus publish <20ms, indexer consume <200ms, changed blobs from Git store <500ms typical, shard writer applies posting-list delta <1s warm.
Why <2 min, not <5s. Index updates batch many pushes to amortize shard write amplification. Sub-5s would require write-through per push plus per-shard fsync, roughly 10× the cost for freshness most users do not notice. A "indexed N seconds ago" receipt on results makes the staleness window visible.
[9.5] Buildable version
OSS: Zoekt. Production trigram-index engine in Go (Google, maintained by Sourcegraph). Architecturally the same family as a custom sparse-trigram engine. Shard-per-repo index, native regex, shard server + web frontend. Deploys as: one indexer + N shard hosts behind a coordinator. Practical into the multi-TB to tens-of-TB range.
For a packaged UI, deploy Sourcegraph OSS, which wraps Zoekt with symbol indexing and LSIF/SCIP navigation.
Managed. Sourcegraph Cloud for zero cluster ops. For teams on cloud search: managed OpenSearch / Elastic Cloud / Azure AI Search with an ngram tokenizer works for literal + moderate regex. Good for <5TB with mostly-literal queries.
Graduate to a custom engine only when code volume pushes into tens of TB and regex QPS sustains hundreds per second. That is a multi-year project.
Takeaway. Trigram inverted index with shard fan-out gives literal and regex search at scale. BM25 works for prose; code search usually needs byte-pattern indexing.
10. CI/CD
Decision. If <1K CI jobs/day, managed runners beat any self-hosted pool. Below modest scale, managed runners are usually cheaper and simpler.
[10.1] Event flow
[10.2] Workflow parsing
name: CI
on:
push:
branches: [main, 'release/**']
pull_request:
branches: [main]
concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true
jobs:
lint: { runs-on: ubuntu-latest, steps: [...] }
test:
runs-on: ubuntu-latest
strategy: { matrix: { go-version: ['1.22', '1.23'] } }
steps: [...]
build: { needs: [lint, test], runs-on: ubuntu-latest, steps: [...] }Parser steps: read YAML at HEAD, match on: against event, expand matrix, build DAG, apply concurrency groups.
[10.3] Scheduling constraints
| Constraint | How |
|---|---|
| Per-repo concurrency cap | Plan tier |
| Concurrency groups | concurrency: in YAML |
| Dependencies | needs: on the DAG |
| Matrix | expand into N jobs |
| Self-hosted | route by runner label |
| Priority | paid > free |
| Stale timeout | queued >24h cancelled |
[10.4] VM lifecycle
- Allocator pulls next job.
- Pool selects a pre-warmed VM matching labels. Trust boundary determines sharing: any untrusted workload gets a dedicated VM per job. Public / untrusted-by-default jobs never share a VM.
- Ephemeral OS disk mounts a fresh immutable base image. Reimage between jobs is on the order of a few seconds.
- Runner clones shallow, runs steps, streams logs.
- Job completes; disk reimaged; VM returns to pool.
- Cold boot (tens of seconds) happens only on pool exhaustion.
VM-level isolation gives a stronger boundary than shared-kernel containers: a kernel-level escape in the runner cannot touch the host.
Buildable options. Same core idea (per-job isolated sandbox with fast reuse):
| Option | Isolation | Ops cost | When to pick |
|---|---|---|---|
| K8s + ARC (containers) | shared-kernel | low | Trusted code only. |
| K8s + ARC + Kata | VM-grade (~1–2s boot) | medium | Untrusted code on an existing K8s footprint. |
| Managed per-job sandboxes | VM-grade | low (managed) | Steady CI, no appetite for K8s ops. |
| Managed runners (GitHub-hosted) | VM-grade | zero | Below ~1K jobs/day. |
| Custom microVM | VM-grade | high | Sustained >50K concurrent; multi-year project. |
[10.5] Biggest CI cost drivers
For most platforms, CI is the largest infra line item. The five drivers:
- Idle warm capacity. Pre-warmed pools trade idle cost for dispatch latency; undersized = cold-boot spikes, oversized = steady bleed.
- Peak burst overprovisioning. Monday mornings and release days pull 3–5× the steady rate; the fleet has to cover it.
- Large artifact storage.
dist/tarballs, container images, test output blobs at 90-day retention multiply fast. - Windows / macOS runners. 3–10× the per-minute cost of Linux VMs and rarely justifiable outside platform-specific CI.
- Network egress. Clone bandwidth, artifact downloads, and especially cross-region data transfer for distributed runner fleets.
Optimization order: right-size the warm pool to real p95 demand, archive artifacts to cold storage, restrict OS runners to workflows that need them, collocate runners with caches.
[10.6] Log streaming
Runner VM -> gRPC -> log pipeline (sharded by job_id) ->
WebSocket for live tail ->
Object store for archival
Lines carry timestamp + step number for grouped UI rendering.
Takeaway. Per-job VM with fast reimage isolates untrusted code; scheduler, allocator, log pipeline is the shape regardless of which substrate runs the jobs.
11. Data Model
Main entities, short descriptions. Full MySQL 8 DDL in Appendix E.
| Table | Cluster | Purpose |
|---|---|---|
users, organization_members, teams, team_members | users | Accounts, org membership, teams |
repositories, repository_permissions, branch_protection_rules | repos | Repo metadata, access grants, protection |
fork_networks, fork_network_members | repos | Fork-network membership |
webhooks, webhook_deliveries | repos | Hook config and delivery audit |
pull_requests, pull_request_reviews, review_comments | issues | PR state, reviews, inline comments |
workflow_runs, workflow_jobs, workflow_steps | actions | CI state |
Sharding router. The app connects to a query proxy as if it were a single MySQL server; the proxy parses SQL and routes to the right backing instance. The functional partitioning uses five keyspaces (Vitess term for a logical dataset that maps to one or more physical shards): users, repos, issues, actions, gists. Cross-keyspace joins run in the Go core, not in SQL.
Queries live in queries/*.sql and sqlc generates typed Go at build time. No ORM; no runtime reflection.
[11.1] Redis keys
Single Redis fleet carries cache, queues, rate limits, sessions, permission cache:
# Rendered content cache
fragment:{template}:{repo_id}:{pr_id} rendered partial (TTL 24h)
file_diff:{old_blob}:{new_blob} serialized hunks (TTL 7d)
markdown:{sha1(body)} rendered HTML (TTL 7d)
# Permission + authz
perm:{user}:{repo} permission STRING (TTL 5m)
team_members:{team} SET user_ids (TTL 10m)
branch_protection:{repo}:{branch} HASH rule fields (TTL 5m)
# Rate limits + sessions
rate:api:{user} sliding-window counter
session:{id} HASH
# Job queues
jobs:{queue}:pending LIST
jobs:{queue}:scheduled ZSET by unix-ms
Above a terabyte cache footprint, the fragment/diff/markdown tier can move to Memcached; queues, rate limits, and sessions stay on Redis.
Takeaway. MySQL holds source of truth. Redis carries everything derived or ephemeral. Partitioning follows access patterns, not table-count targets.
12. Back-of-the-Envelope
All numbers are design envelopes for a platform of this shape, not internal metrics.
[12.1] Storage
Repos: ~150-250M
Total Git objects: ~2-3T
Unique pack storage: ~8-20PB
3x replication: ~24-60PB NVMe
Object-store backup: ~8-20PB
Forks: ~30% of repos in ~1M networks, avg size ~50
Naive cost/network: ~250GB
With alternates: ~5GB (savings ~98%)
Code-search index: ~40-75TB
Shards: ~500-1,500 at ~50GB each
Actions logs (90d): ~4.5-6PB
MySQL: ~15-25TB
Redis (cache + queues): ~2-4TB
[12.2] Traffic
Git ops/day: ~800M-1B
Avg ops/sec / peak: ~9-12K / ~25-40K
Daily egress: ~6-10PB
Daily push ingress: ~20-40TB
PRs opened: ~10-15M/month
Active PRs: ~30-50M
PR views: ~15-25M/day
Diff computes: ~500-800K/day (Redis cache hit ~95%)
Review comments: ~1.5-2.5M/day
Force-push reposition: ~80-120K/day
Search queries: ~40-60M/day
Latency: p50 <200ms, p99 <1s
Freshness: <2 min
Actions runs: ~8-12M/day
Jobs/sec avg / peak: ~300-400 / ~800-1,200
Concurrent peak: ~300-500K
Log volume: ~50-70TB/day
Webhook deliveries: ~400-600M/day
Retry rate: ~5% (max 8 attempts over 24h)
Event bus: ~20-30K msg/sec
Job queue peak: ~15-25K tasks/sec
[12.3] Growth
| Horizon | Multiplier | First bottleneck | Mitigation |
|---|---|---|---|
| 18 mo | 2x | Git fleet, search shards, workers | double hosts, split hot shards |
| 3 yr | 5x | MySQL write-heavy tables, bus partitions | add keyspaces, 2x partitions |
| 5 yr | 10x | Largest monorepos outgrow a single Git slot | multi-host repo striping, federated search |
13. Access Control
[13.1] Hierarchy
[13.2] Permission levels
| Action | Read | Triage | Write | Maintain | Admin |
|---|---|---|---|---|---|
| View code + issues | Y | Y | Y | Y | Y |
| Manage issues + PRs | N | Y | Y | Y | Y |
| Push unprotected | N | N | Y | Y | Y |
| Manage protection | N | N | N | Y | Y |
| Delete repo | N | N | N | N | Y |
| Manage collaborators | N | N | N | N | Y |
[13.3] Push / merge checks
func CheckPushPermission(ctx context.Context, userID, repoID int64,
ref, oldSHA, newSHA string) (bool, string, error) {
perm, err := authz.EffectivePermission(ctx, userID, repoID)
if err != nil { return false, "", err }
if perm.Level() < authz.Write { return false, "no write access", nil }
branch := strings.TrimPrefix(ref, "refs/heads/")
p, err := authz.FindMatchingProtection(ctx, repoID, branch)
if err != nil { return false, "", err }
if p == nil { return true, "no protection", nil }
if !git.IsAncestor(ctx, oldSHA, newSHA) && !p.AllowForcePushes {
return false, "force pushes not allowed", nil
}
if p.RestrictPushes && !slices.Contains(p.AllowedPushers, userID) {
return false, "direct pushes restricted", nil
}
if p.RequirePR { return false, "changes must come through a pull request", nil }
if p.RequireLinearHistory && git.IsMergeCommit(ctx, newSHA) {
return false, "linear history required", nil
}
return true, "ok", nil
}[13.4] CODEOWNERS
.github/CODEOWNERS maps paths to required reviewers. Last matching pattern wins (.gitignore-style). On PR open: parse at base branch, collect required reviewers per changed path, auto-request, block merge until each required team has at least one approval.
[13.5] Auth cache
| Layer | Key | TTL | Invalidation |
|---|---|---|---|
| Redis L1 | perm:{user}:{repo} | 5m | event-bus notification |
| Redis L2 | team_members:{team} | 10m | event-bus notification |
| Redis L3 | branch_protection:{repo}:{branch} | 5m | event-bus notification |
| In-process | org_members:{org} | 60s | TTL |
Permission changes publish to the event bus; consumers invalidate. Worst case: 5-min window of stale grant after revoke.
Invalidation triggers. A InvalidatePermCache task fires on: team membership change, repo collaborator add/remove, org role change, CODEOWNERS file push, branch-protection rule edit, visibility flip. TTL is the backstop: in the pathological case where a task is lost, the 5-minute TTL still converges. Safety-critical operations (a push right after a branch-protection tightening) bypass cache and read MySQL directly.
Takeaway. Org → team → repo hierarchy cached in Redis with event-driven invalidation; branch protection enforced at receive-pack, not only the UI.
14. Fork Model
Common mistake. Storing each fork as a full repository copy. With alternates pointing to a shared object pool, a fork costs MB, not GB.
[14.1] Alternates
A fork is a bare repo with objects/info/alternates pointing at the upstream's object store. Reads fall through to upstream; writes land in the fork's own store.
The Linux kernel repo is ~5GB. With 10,000 forks and no unique commits, naïve storage = 50TB. With alternates, the shared pool stays ~5GB; each fork stores only its own ref table plus unique commits. At ~60M forks across ~1M networks, alternates is the difference between feasible and unaffordable.
[14.2] Fork-aware garbage collection
Standard git gc removes objects unreachable from any ref. In a fork network, an object unreachable in upstream may still be referenced by a fork. Reachability is computed across the union of refs:
func ForkAwareGC(ctx context.Context, networkID int64,
q *queries.Queries, git GitService) error {
network, _ := q.GetForkNetwork(ctx, networkID)
all := append([]Repo{network.Upstream}, network.Forks...)
reachable := map[string]struct{}{}
for _, repo := range all {
refs, _ := git.ListRefs(ctx, repo.ID)
for _, ref := range refs {
for sha := range git.WalkReachableObjects(ctx, repo.ID, ref.SHA) {
reachable[sha] = struct{}{}
}
}
}
all_obj, _ := git.ListAllObjects(ctx, network.Upstream.ID)
var unreachable []string
for _, sha := range all_obj {
if _, ok := reachable[sha]; !ok { unreachable = append(unreachable, sha) }
}
if len(unreachable) > 0 {
_ = git.DeleteObjects(ctx, network.Upstream.ID, unreachable)
}
return git.Repack(ctx, network.Upstream.ID, RepackFull)
}Four mitigations keep it tractable: reachability bitmaps, incremental GC (reclaim unreachable objects in small phases between full repacks, instead of one long stop-the-world pass), scheduled full/incremental split, 14-day quarantine before deletion.
[14.3] Upstream deletion
Elect a new root, transfer the shared object store, re-point alternates. O(1) metadata; no data copies.
Takeaway. Forks share the upstream object pool via alternates; GC reachability is computed across the fork network, not per-repo.
15. Bottlenecks and Backpressure
[15.1] Pack negotiation on large repos
Many-branch repos exchange lots of have/want lines. Mitigations: multi-pack-index (a single index file spanning many packfiles, so lookups don't have to scan each pack's index separately), uploadpack.allowReachableSHA1InWant, fetch.negotiationAlgorithm=skipping, reachability bitmaps.
[15.2] Monorepo hot spots
Large monorepos (10GB+ packs, hundreds of pushes/hour) saturate one Git replica. Mitigations: dedicated replica sets for hot repos, partial clone, sparse checkout, memory-mapped pack segments, round-robin reads.
Client-side mitigations. Partial clone (git clone --filter=blob:none) skips blob download until checkout, cutting clone size for deep-history repos by 80%+. Sparse-checkout restricts the working tree. --depth=1 shallow clones are fine for CI, bad for history-aware workflows.
Server-side at scale. Once a repo's ref count passes ~1M, split refs by namespace so info/refs advertisement stays bounded. Reachability bitmap regeneration becomes expensive over ~50M objects; run it on a dedicated offline replica and ship the bitmap. Above ~100GB pack, federated storage is the next step and is a multi-quarter project.
[15.3] Search freshness vs query latency
Separate indexing and serving; prioritize popular repos; surface "indexed N seconds ago" in the UI.
[15.4] Actions runner scaling
Pre-warmed pool, spot/low-priority VMs, fair per-account queueing, self-hosted runner support.
[15.5] Webhook thundering herd
A popular repo with thousands of hooks generates simultaneous egress. Per-domain rate limit, jitter (0–5s), per-target circuit breaker, batch where supported.
[15.6] Three-layer admission control
- Queue depth: webhook backlog past 1M → throttle new webhook-emitting writes; paid orgs prioritized.
- Git replica load: p99 pack serving >1s for 2 min → remove replica from reads, scale out.
- Actions queue depth >50K for 5 min → throttle free-tier jobs, scale VM pool.
All reversible on recovery.
Takeaway. Admission control is three-layered (task queue, Git replicas, CI pool); throttling hits free-tier first and is reversible the moment load subsides.
16. Multi-Region and Lifecycle
[16.1] Write locality
Single write region per repo; read replicas are regional. MySQL primary and Git primaries both live in one write region. Reads from regional replicas with ~50–150ms cross-region lag. Event replication across regions runs via MirrorMaker 2 so that indexers and analytics consumers in each region see the same push.v1 stream. On write-region outage, failover RTO is 30–60 minutes. RPO is bounded by semi-sync replication lag (~500ms typical).
[16.2] Cross-region
- MySQL semi-sync replicas: ~50ms intra, 150–300ms cross-region lag.
- Git async replicas for hot monorepos; fall back to nearer replica with up to ~60s freshness gap.
- Search index built per-region from bus replay.
- Redis is per-region; no cross-region sync (caches rebuild on miss).
[16.3] Region failover
Primary region dark: reads continue from other regions; writes for that region reject 503. Manual failover RTO 5–10 min, RPO <500ms.
[16.4] Repository lifecycle
| State | Trigger | Storage |
|---|---|---|
| Active | created or pushed in 90d | Full 3-replica, NVMe, indexed |
| Dormant | no push in 90d, <10 stars | 3-replica, cheaper SKU; search index cold |
| Archived | archived flag or no push in 3y | 2-replica; pack compacted; search index dropped |
| Cold | opt-in or unpaid-tier limit | Single replica + object store; clone re-hydrates |
| Deleted | user action | 90d recoverable; then GC fork-aware |
Artifact expiry. Release and Actions artifacts expire on per-plan policy: 30–90 days default, configurable up to 400.
Webhook log retention. webhook_deliveries kept 90 days for replay UI; older rows archived as Parquet.
Takeaway. Single write region with async read replicas; RTO 30–60min on write-region failover. Lifecycle tiers cold repos down to cheaper storage.
17. Failure Scenarios
[17.1] Git replica failure
Replica crash: reads continue from the other two; writes continue on majority. Router detects via health check and shifts traffic. Replacement provisioned, then bootstrapped via git bundle (a single binary file packaging all objects + refs of the repo, like a portable snapshot you can ship over the wire) from a healthy peer, followed by ref catch-up — replaying the recent branch/tag updates that landed after the bundle was created so the new replica matches current state. Detailed failure trace in §6.6.
[17.2] Pack corruption
A bitflip is a single bit on disk silently flipping from 0 to 1 (or vice versa) without an OS-level error — caused by bad RAM, bit rot, or controller bugs. At petabyte scale across tens of thousands of disks, this is statistically inevitable. A flipped bit inside a Git pack file means the pack's checksum no longer matches, or worse, decompression returns wrong bytes.
git fsck ("file system check") is Git's full integrity scan: it recomputes the SHA of every object, verifies pack checksums, and walks the reachability graph to confirm every referenced object exists and is intact. Slow (hours on a large repo) but authoritative.
Detection layers: pack checksum on every read, weekly fsck across all packs, and read errors surfaced from client clones. Recovery: copy the clean pack from a healthy replica (3-replica model, §5.6), verify checksum, run fsck. Last-resort if all three replicas are corrupt: restore from object store (cold backup).
[17.3] Search shard loss
Coordinator marks shard unavailable; queries return partial results with a banner. Warm standby takes over, or shard rebuilds from bus replay (~2h per 50GB shard, modeled).
[17.4] Actions VM pool exhaustion
Dependabot wave across 10K repos. Pool pre-warmer scales out; fair per-account queueing prevents starvation; cancel-in-progress: true groups cancel stale runs.
[17.5] Webhook target outage
Delivery tasks fail, queue grows. Retry schedule 1m, 5m, 30m, 2h, 6h, 12h, 24h. After 8 fails, hook marked failing; admin notified. Per-target circuit breaker pauses deliveries.
[17.6] Fork alternates break
Fork's alternates path invalid after rebalance. Hooks validate alternates; weekly integrity check. Recovery: look up current upstream route in MySQL, update alternates, fsck --connectivity-only.
[17.7] MySQL primary failover
Primary dies. Orchestrator (GitHub's open-source MySQL HA tool that monitors replication topology) detects the failure and promotes a healthy replica to primary in 20–60s. Reads continue from the surviving replicas throughout. Writes buffer briefly while the Go application's database/sql connection pool retries against the new primary. PR creation and other write paths block for the 20–60s promotion window, then resume.
[17.8] Event bus broker outage
One Kafka broker dies. Each topic is split into partitions; each partition has one leader broker (handles all reads/writes) and follower replicas. With replication factor (RF) = 3, every partition lives on 3 brokers, so losing one leaves 2 survivors. Partitions whose leader was on the dead broker see a sub-second produce spike (writes blocked) until Kafka auto-elects a new leader from the surviving replicas. No message loss.
[17.9] Job queue saturation
Tasks pile up. Scale worker fleet. Apply per-queue admission control at Core. Shard Redis into multiple queue server groups if one primary saturates.
[17.10] Region outage
Primary region dark; pushes for the region reject with 503. Manual failover to secondary region. RTO (Recovery Time Objective: how long until service is back) 5–10 min. RPO (Recovery Point Objective: how much recent data may be lost) <500ms — i.e., at most half a second of in-flight writes.
Takeaway. Every derived store is rebuildable from source of truth plus event replay. Only MySQL and Git store losses are real data loss.
18. Operational Playbook
[18.1] Deployment
HPA = Kubernetes Horizontal Pod Autoscaler (scales container replicas up/down based on a metric). Canary = staged rollout that sends new code to a small slice of traffic first (e.g. 1%), then 10%, 50%, 100%, holding at each step long enough to catch regressions before widening blast radius. Rolling = replace instances one batch at a time, draining in-flight requests before shutdown.
| Service | Model | Scaling | Rollout |
|---|---|---|---|
| Go monolith | Container fleet, static binary | HPA on latency | Canary 1/10/50/100 |
| Git Service | Container fleet | HPA on connections | Canary 5/25/100 |
| Actions orchestrator | Container fleet | HPA on queue depth | Canary 5/100 |
| VM pool | Pool manager | Auto-scale on queue | Rolling |
| Search shards | Bare metal | Manual | Rolling per-shard |
| Workers | Container fleet | HPA on queue | Canary 10/100 |
| Git replicas | Bare metal | Manual | Rolling, drain first |
[18.2] Schema migrations
gh-ost is GitHub's open-source online schema migration tool. It creates a shadow copy of the table with the new schema, tails the MySQL binlog to replay live writes onto the copy, backfills existing rows throttled by replica lag, and atomically swaps the tables under a short app-layer lock. Avoids the multi-hour write block that a naive ALTER TABLE would cause on a billion-row table.
[18.3] Feature flags
if features.Enabled(ctx, "new_diff_renderer",
features.User(userID), features.Repo(repoID)) {
renderNewDiff(w, r)
} else {
renderLegacyDiff(w, r)
}Config in Redis, coordinated via etcd for atomic flips.
[18.4] Metrics and alerts
| Metric | Alert | Why |
|---|---|---|
| Git push p99 | >5s for 2m | user-visible slowness |
| Git healthy replicas | <2 for 1m | durability risk |
| Event bus lag | >30s for 60s | fan-out backlog |
| Redis diff cache hit rate | <85% for 10m | cache eviction storm |
| Search freshness | >10m for 5m | indexer stuck |
| Actions queue depth | >50K for 5m | pool under-provisioned |
| Actions dispatch p99 | >5m for 5m | allocator issue |
| Webhook failure rate | >10% for 5m | target outage or breaker |
| MySQL query p99 | >1s for 2m | primary overloaded |
| Queue lag | >1m for 2m | workers under-provisioned |
[18.5] Top 5 pages
- Git push p99 >10s → store or libgit2 regression.
- Git replicas <2 → durability-critical.
- Actions queue depth >100K → pool storm.
- Webhook failure rate >30% → target outage or breaker.
- MySQL failover triggered → verify promotion, drain writes.
[18.6] Observability stack
- Metrics: Prometheus; long-term in Datadog or self-hosted TSDB.
- Tracing: OpenTelemetry; trace headers through queues and bus.
- Logs: structured JSON via
slogto Loki or Splunk. - Profiling: continuous Go pprof + Pyroscope.
- Synthetic probes: clone, push, PR-open, search, workflow-run from each region every 30s.
[18.7] Backups
- Git store: hourly pack snapshot + daily full to object store; 7-day PITR via ref-log replay.
- MySQL: binlog archive to object store, 5-min PITR.
- Redis: AOF everysec + hourly snapshot.
- Redis cache tier: no backup (rebuilds on miss).
- Event bus: RF=3; retention-is-backup.
- Search: rebuild from bus replay.
[18.8] Billing and quotas
| Unit | Metered | Enforced |
|---|---|---|
| API requests | per user/token per hour | Redis sliding window |
| Actions minutes | per billable job per plan | per-org counter updated at completion |
| Actions storage | artifacts + logs | nightly rollup from object-store manifests |
| Packages bandwidth | egress per org | edge-tier counters |
| LFS storage + bandwidth | per repo | object-store metering |
| Seats (private repos) | per org | subscription record in billing cluster |
Overage: free tier hard-stops at 100%; paid plans 10% soft overage with warning, then hard-stop.
[18.9] Where the money goes
No dollar figures; only ordering. For a CI-heavy platform of this shape:
- CI compute (by a wide margin).
- Git NVMe fleet (3-replica storage).
- Egress bandwidth.
- Code-search SSDs.
- MySQL + Redis fleets.
- Object storage (cold logs, old packs).
- Observability (long-term metrics, tracing, logs).
The ordering shifts with business mix: read-heavy + low-CI moves egress up and CI down; enterprise-heavy inflates observability and compliance storage.
Takeaway. Deploy with feature flags, schema-change with gh-ost, watch the top five dashboards. Most of the cost lives in CI compute and Git NVMe.
19. SLOs
| SLO | Target | Budget |
|---|---|---|
| Repo availability (reads) | 99.95% | 4.38h/yr |
| Push ACK p99 | <5s | 7.2h/month |
| Clone p99 (50MB typical repo) | <8s | 7.2h/month |
| PR diff render p99 | <1s | 7.2h/month |
| Search query p99 | <1s | 7.2h/month |
| Actions dispatch p99 | <5m | 7.2h/month |
| Webhook delivery success | >95% | 36h/month below |
| Git object durability | 3x + backup | 0 object loss/yr |
| Search freshness | <2m typical | 10% of pushes/day over bound |
Error budget policy: four consecutive weeks below target on any availability SLO pauses feature work for reliability.
Takeaway. p99 latency budgets drive capacity planning; availability budgets drive incident response.
20. Security, Abuse, and Trust
Common mistakes to avoid
- Storing forks as full repository copies.
- Using BM25 as code search.
- Container-only runners for untrusted CI.
- Synchronous webhook fan-out from the push path.
- Branch protection enforced only in the UI.
[20.1] Authentication
HTTPS Git: PAT or OAuth via HTTP Basic. SSH Git: ed25519 / RSA keys. Web + API: OAuth 2.0 with short-lived access + refresh tokens. 2FA: TOTP or WebAuthn/passkeys.
[20.2] Authorization layers
Edge: TLS, WAF. Core: repo visibility, collaborator/team. Git transport: receive-pack hook for branch protection, CODEOWNERS, push ruleset.
[20.3] Transport
SSH ed25519 host keys. TLS 1.3. Pre-receive hook runs before objects promote from quarantine; scans for secret patterns and entropy; push protection blocks known-secret patterns at the hook.
[20.4] CI isolation
Per-job VM: dedicated kernel/FS, egress filtering, no cross-repo secrets, env-var injection with log masking, auto-redact secret-like log lines, reimage on exit.
[20.5] Webhooks
HMAC-SHA256 payload signing, secret in the secrets store. Targets verify X-Hub-Signature-256. URL validation blocks internal IPs, localhost, link-local.
[20.6] Privacy and compliance
AES-256 at rest. TLS 1.3 client-server. mTLS between internal services. GDPR export + right-to-delete cascades through Git objects, MySQL, search index, CI logs. SOC 2 Type II; FedRAMP for government.
[20.7] Rate limits
| Action | Auth | Unauth |
|---|---|---|
| API requests | 5K/h | 60/h |
| Search | 30/min | 10/min |
| Clone/fetch | bandwidth-throttled | 60/h per IP |
| Actions minutes | plan-dependent | N/A |
[20.8] Abuse patterns
Malware hosting. Inline: perceptual-hash + YARA on new releases/packages; entropy check on opaque blob pushes. Offline: fleet pHash match. Action: quarantine + partner notify + ban on confirmation.
PAT / token leaks. Inline: push-protection matches 200+ provider patterns at the receive-pack hook and rejects. Offline: continuous scan + partner notify.
Account takeover. Inline: new-SSH-key event requires step-up when 90-day geo profile changes. Offline: device-fingerprint clustering. Action: invalidate session, force re-auth.
Bot mass-creation. Inline: CAPTCHA + risk-scored device + email-domain heuristics. Offline: graph clustering on payment method, IP range, content hashes.
Crypto miners in Actions. Inline: YAML scan, egress filtering to mining pools, CPU-profile match. Offline: billing anomaly.
Shill stars + DMCA shields. Offline: star graph clustering; DMCA 7-day response.
Hate and harassment. Inline: content classifier flags high-risk before notification dispatch.
[20.9] Audit log
All security-relevant actions publish to a dedicated audit topic, archived to object store, retained 7 years: repo access changes, org membership, protection rule changes, secret scanning alerts, SSH/PAT create/revoke, workflow modifications, admin overrides.
21. What to build at 10, 50, or 200 engineers
The full architecture in this post is the 200-engineer answer. Most teams should not start there.
| Layer | 10 engineers | 50 engineers | 200 engineers |
|---|---|---|---|
| Monolith | Go or Rails, single container behind Fargate / Cloud Run | Go or Rails on K8s | Go or Rails, dedicated teams per subsystem |
| DB | Managed MySQL, 1 primary + 1 replica | MySQL + sharding router | MySQL + sharding router, functional keyspaces, gh-ost |
| Cache | Managed Redis | Managed Redis | Self-hosted sharded Redis |
| Async jobs | Redis-backed queue on managed Redis | Dedicated Redis cluster | Sharded Redis |
| Events | None (call handlers directly) | Managed bus | Dedicated bus team |
| Git storage | Gitaly + Praefect on 3 small VMs, or a hosted Git product | Gitaly + Praefect, 3 replicas | Custom or heavily-operated replica coordinator fleet |
| Code search | Skip, or managed OpenSearch with ngram | Zoekt cluster | Custom sparse-trigram engine |
| CI runners | Managed runners | K8s + ARC + managed burst | Custom VM pool, or ARC + Kata at scale |
| Edge | Managed ALB | Managed LB + WAF | Custom L4 + HAProxy |
| Secrets | Cloud-native | Vault or cloud-native | Cloud-native with strict audit |
| Object storage | Cloud | Cloud + local cache | Cloud with cross-region and tiered lifecycle |
| Observability | Managed (Grafana Cloud, Datadog) | Prometheus + Grafana Cloud, OTel | Full OTel, continuous profiling, SRE per subsystem |
| Cost envelope (modeled) | ~$3–10K/month | ~$50–200K/month | ~$10M+/month |
| Breaks at | single DB primary cap; tens of thousands of users | Praefect saturation; tens of TB search; >10K concurrent CI | nothing at GitHub-scale |
Rule. The 10-engineer stack ships the product. The 50-engineer stack serves growth. The 200-engineer stack serves the business that growth produced. Skipping levels costs infrastructure work that should have been feature work.
Best fit for most teams. The 10-engineer column. Nearly every "time to build a platform" conversation is actually at this scale.
22. Key Takeaways
- Content-addressable storage (SHA-1 hardened) is the foundation.
- 3 replicas in different failure domains, majority-commit on the ref update.
- Hardest problem is fork-aware GC; bitmaps + incremental + 14-day quarantine keep it tractable.
- PR diffs against merge base, cached by immutable
(old_blob, new_blob)pairs. - Comments anchored to immutable original fields; repositioning is a hunk-walk.
- Code search is trigram, not BM25.
- CI on per-job VMs with fast reimage.
- Branch protection lives at the receive-pack hook, not the UI.
- Async splits by usage pattern: per-target retries go to the queue; broadcast events go to the bus.
- Secrets and object storage are solved problems; pick by ops budget.
- Language is not load-bearing.
23. API Design (appendix)
[23.1] Git transport (smart HTTP)
Git Service vs Core. Git Service is the daemon that speaks Git's wire protocol (SSH and smart HTTP): accepts git push / git clone, authenticates, runs pre-receive hooks, writes pack bytes to the Git store. Core is the Go monolith for everything else. Split because long-lived Git streaming and short-lived HTTP want separate deploy units.
GET /{owner}/{repo}.git/info/refs?service=git-upload-pack
POST /{owner}/{repo}.git/git-upload-pack
POST /{owner}/{repo}.git/git-receive-pack
libgit2 handles object/pack; Go calls it via git2go. libgit2 is the widely-used embedded Git library with permissive licensing; used by GitLab's Gitaly, VS Code, JetBrains, GitHub Desktop, SourceTree. Some hot paths still shell out to the git binary.
[23.2] HTTP routing (chi)
r := chi.NewRouter()
r.Use(middleware.RequestID, middleware.Recoverer)
r.Use(obs.Tracing, obs.Metrics)
r.Use(authmw.OAuth, authmw.RateLimit)
r.Route("/repos/{owner}/{repo}", func(r chi.Router) {
r.Use(repomw.Resolve, repomw.RequirePerm(authz.Pull))
r.Get("/", handlers.GetRepo)
r.Post("/forks", handlers.CreateFork)
r.Route("/pulls/{number}", func(r chi.Router) {
r.Get("/", handlers.GetPR)
r.Get("/diff", handlers.GetPRDiff)
r.Post("/comments", handlers.CreateReviewComment)
r.Put("/merge", handlers.MergePR)
})
})[23.3] REST surface (selected)
POST /user/repos
POST /repos/{owner}/{repo}/forks
GET /repos/{owner}/{repo}/contents/{path}?ref=main
POST /repos/{owner}/{repo}/pulls
GET /repos/{owner}/{repo}/pulls/{n} (Accept: .diff)
POST /repos/{owner}/{repo}/pulls/{n}/comments
PUT /repos/{owner}/{repo}/pulls/{n}/merge
GET /search/code?q=...
GET /repos/{owner}/{repo}/actions/runs
GET /repos/{owner}/{repo}/actions/jobs/{id}/logs
POST /repos/{owner}/{repo}/hooks
[23.4] Webhook delivery
POST <target URL>
X-GitHub-Event: push
X-GitHub-Delivery: <uuid>
X-Hub-Signature-256: sha256=<HMAC of body using secret from vault>
24. Appendix
A. Reachability bitmap primer
Each ref carries a bitmap indexing every object reachable from it. Reachability becomes a bitwise OR of bitmaps instead of a graph walk. Used for pack negotiation, fork GC, and clone-pack precomputation.
B. Pack negotiation
C. Fork network invariants
- Exactly one root per network.
- Every member's
alternatespoints to the root. - Reachability is the union across members' refs.
- Root deletion promotes a new root; alternates re-point.
- GC operates on the union with 14-day quarantine.
D. App-tier divergences: GitHub production vs this design
Substrate choices are in §4.2. This table focuses on the app tier where Go diverges from Ruby.
| App-tier subsystem | GitHub production | This design | Why |
|---|---|---|---|
| Application monolith | Ruby on Rails (github/github) | Go monolith on chi + sqlc + Asynq | Language choice. |
| DB access | Trilogy + ActiveRecord | database/sql + go-sql-driver/mysql + sqlc | Follows the language. |
| Background jobs | Resque (Ruby) | Asynq (Go) | Same Redis-backed shape. |
| Real-time | ActionCable | gorilla/websocket | Rails-free equivalent. |
| Feature flags | Flipper | In-process + Redis + etcd | Avoid pulling a Rails-idiomatic vendor. |
| Profiling | rbspy / stackprof | Go pprof + Pyroscope | Native to Go. |
| Cache + ephemeral state | Memcached + Redis | Redis (single tier) | Memcached earns its weight only above ~TB. |
E. MySQL 8 DDL
Full schemas for the tables summarized in §11.
Users cluster.
CREATE TABLE users (
id BIGINT UNSIGNED PRIMARY KEY,
login VARCHAR(40) NOT NULL UNIQUE,
email VARCHAR(254) NOT NULL,
display_name VARCHAR(255),
avatar_url VARCHAR(500),
type ENUM('User','Organization') NOT NULL DEFAULT 'User',
two_factor TINYINT(1) NOT NULL DEFAULT 0,
suspended_at TIMESTAMP(6) NULL,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
updated_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6),
INDEX idx_users_email (email)
) ENGINE=InnoDB;
CREATE TABLE organization_members (
org_id BIGINT UNSIGNED NOT NULL,
user_id BIGINT UNSIGNED NOT NULL,
role ENUM('owner','member','billing_manager') NOT NULL DEFAULT 'member',
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
PRIMARY KEY (org_id, user_id)
) ENGINE=InnoDB;
CREATE TABLE teams (
id BIGINT UNSIGNED PRIMARY KEY,
org_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) NOT NULL,
privacy ENUM('visible','secret') NOT NULL DEFAULT 'secret',
permission ENUM('pull','triage','push','maintain','admin') NOT NULL DEFAULT 'pull',
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
UNIQUE KEY uq_org_slug (org_id, slug)
) ENGINE=InnoDB;
CREATE TABLE team_members (
team_id BIGINT UNSIGNED NOT NULL,
user_id BIGINT UNSIGNED NOT NULL,
role ENUM('member','maintainer') NOT NULL DEFAULT 'member',
PRIMARY KEY (team_id, user_id)
) ENGINE=InnoDB;Repos cluster.
CREATE TABLE repositories (
id BIGINT UNSIGNED PRIMARY KEY,
owner_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(100) NOT NULL,
description TEXT,
is_private TINYINT(1) NOT NULL DEFAULT 0,
is_fork TINYINT(1) NOT NULL DEFAULT 0,
fork_source_id BIGINT UNSIGNED NULL,
default_branch VARCHAR(255) NOT NULL DEFAULT 'main',
disk_usage_kb BIGINT UNSIGNED NOT NULL DEFAULT 0,
replica_route JSON NOT NULL,
archived TINYINT(1) NOT NULL DEFAULT 0,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
updated_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6),
pushed_at TIMESTAMP(6) NULL,
UNIQUE KEY uq_owner_name (owner_id, name),
INDEX idx_fork_source (fork_source_id)
) ENGINE=InnoDB;
CREATE TABLE repository_permissions (
repo_id BIGINT UNSIGNED NOT NULL,
grantee_id BIGINT UNSIGNED NOT NULL,
grantee_type ENUM('User','Team') NOT NULL,
permission ENUM('pull','triage','push','maintain','admin') NOT NULL,
PRIMARY KEY (repo_id, grantee_id, grantee_type)
) ENGINE=InnoDB;
CREATE TABLE branch_protection_rules (
id BIGINT UNSIGNED PRIMARY KEY,
repo_id BIGINT UNSIGNED NOT NULL,
pattern VARCHAR(255) NOT NULL,
require_pr TINYINT(1) NOT NULL DEFAULT 1,
required_approvals INT NOT NULL DEFAULT 1,
dismiss_stale_reviews TINYINT(1) NOT NULL DEFAULT 0,
require_status_checks TINYINT(1) NOT NULL DEFAULT 1,
required_checks JSON,
require_linear_history TINYINT(1) NOT NULL DEFAULT 0,
restrict_pushes TINYINT(1) NOT NULL DEFAULT 0,
allowed_pushers JSON,
require_codeowner_review TINYINT(1) NOT NULL DEFAULT 0,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
INDEX idx_protection_repo (repo_id)
) ENGINE=InnoDB;
CREATE TABLE fork_networks (
id BIGINT UNSIGNED PRIMARY KEY,
root_repo_id BIGINT UNSIGNED NOT NULL,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6)
) ENGINE=InnoDB;
CREATE TABLE fork_network_members (
fork_network_id BIGINT UNSIGNED NOT NULL,
repo_id BIGINT UNSIGNED NOT NULL,
forked_from_id BIGINT UNSIGNED NULL,
PRIMARY KEY (fork_network_id, repo_id),
INDEX idx_fork_members_repo (repo_id)
) ENGINE=InnoDB;
CREATE TABLE webhooks (
id BIGINT UNSIGNED PRIMARY KEY,
repo_id BIGINT UNSIGNED NULL,
org_id BIGINT UNSIGNED NULL,
url VARCHAR(1000) NOT NULL,
secret_ref VARCHAR(255) NOT NULL,
content_type ENUM('json','form') NOT NULL DEFAULT 'json',
events JSON NOT NULL,
active TINYINT(1) NOT NULL DEFAULT 1,
ssl_verify TINYINT(1) NOT NULL DEFAULT 1,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
INDEX idx_hooks_repo (repo_id)
) ENGINE=InnoDB;
CREATE TABLE webhook_deliveries (
id BIGINT UNSIGNED PRIMARY KEY,
webhook_id BIGINT UNSIGNED NOT NULL,
event VARCHAR(64) NOT NULL,
action VARCHAR(64) NULL,
payload_size INT NOT NULL,
response_code INT NULL,
response_time_ms INT NULL,
status ENUM('pending','delivered','failed') NOT NULL,
attempts INT NOT NULL DEFAULT 0,
next_retry_at TIMESTAMP(6) NULL,
delivered_at TIMESTAMP(6) NULL,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
INDEX idx_deliveries_hook (webhook_id, created_at),
INDEX idx_deliveries_retry (status, next_retry_at)
) ENGINE=InnoDB;Issues cluster.
CREATE TABLE pull_requests (
id BIGINT UNSIGNED PRIMARY KEY,
repo_id BIGINT UNSIGNED NOT NULL,
number INT UNSIGNED NOT NULL,
author_id BIGINT UNSIGNED NOT NULL,
title VARCHAR(256) NOT NULL,
body MEDIUMTEXT,
state ENUM('open','closed','merged') NOT NULL DEFAULT 'open',
head_repo_id BIGINT UNSIGNED NOT NULL,
head_ref VARCHAR(255) NOT NULL,
head_sha CHAR(40) NOT NULL,
base_ref VARCHAR(255) NOT NULL,
base_sha CHAR(40) NOT NULL,
merge_base_sha CHAR(40) NULL,
mergeable TINYINT(1) NULL,
merged_at TIMESTAMP(6) NULL,
merged_by_id BIGINT UNSIGNED NULL,
merge_commit_sha CHAR(40) NULL,
diff_cache_key VARCHAR(128) NULL,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
updated_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6),
UNIQUE KEY uq_repo_number (repo_id, number),
INDEX idx_prs_state (repo_id, state),
INDEX idx_prs_author (author_id)
) ENGINE=InnoDB;
CREATE TABLE pull_request_reviews (
id BIGINT UNSIGNED PRIMARY KEY,
pr_id BIGINT UNSIGNED NOT NULL,
reviewer_id BIGINT UNSIGNED NOT NULL,
state ENUM('pending','approved','changes_requested','commented') NOT NULL,
body MEDIUMTEXT,
commit_sha CHAR(40) NOT NULL,
submitted_at TIMESTAMP(6) NULL,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
INDEX idx_reviews_pr (pr_id)
) ENGINE=InnoDB;
CREATE TABLE review_comments (
id BIGINT UNSIGNED PRIMARY KEY,
pr_id BIGINT UNSIGNED NOT NULL,
review_id BIGINT UNSIGNED NULL,
author_id BIGINT UNSIGNED NOT NULL,
body MEDIUMTEXT NOT NULL,
path VARCHAR(1024) NOT NULL,
commit_sha CHAR(40) NOT NULL,
original_commit_sha CHAR(40) NOT NULL,
diff_hunk TEXT NOT NULL,
line_no INT NULL,
original_line INT NOT NULL,
side ENUM('LEFT','RIGHT') NOT NULL DEFAULT 'RIGHT',
start_line INT NULL,
in_reply_to_id BIGINT UNSIGNED NULL,
outdated TINYINT(1) NOT NULL DEFAULT 0,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
updated_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6),
INDEX idx_comments_pr (pr_id),
INDEX idx_comments_path (pr_id, path(191))
) ENGINE=InnoDB;Actions cluster.
CREATE TABLE workflow_runs (
id BIGINT UNSIGNED PRIMARY KEY,
repo_id BIGINT UNSIGNED NOT NULL,
workflow_file VARCHAR(255) NOT NULL,
event VARCHAR(64) NOT NULL,
trigger_sha CHAR(40) NOT NULL,
trigger_ref VARCHAR(255) NOT NULL,
trigger_actor_id BIGINT UNSIGNED NULL,
status ENUM('queued','in_progress','completed') NOT NULL DEFAULT 'queued',
conclusion ENUM('success','failure','cancelled','skipped') NULL,
started_at TIMESTAMP(6) NULL,
completed_at TIMESTAMP(6) NULL,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
INDEX idx_runs_repo_created (repo_id, created_at)
) ENGINE=InnoDB;
CREATE TABLE workflow_jobs (
id BIGINT UNSIGNED PRIMARY KEY,
run_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(255) NOT NULL,
status ENUM('queued','in_progress','completed') NOT NULL DEFAULT 'queued',
conclusion ENUM('success','failure','cancelled','skipped') NULL,
runner_id VARCHAR(128) NULL,
runner_os ENUM('linux','macos','windows') NOT NULL DEFAULT 'linux',
labels JSON,
depends_on JSON,
matrix_values JSON,
started_at TIMESTAMP(6) NULL,
completed_at TIMESTAMP(6) NULL,
log_url VARCHAR(500) NULL,
created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
INDEX idx_jobs_run (run_id),
INDEX idx_jobs_status (status)
) ENGINE=InnoDB;
CREATE TABLE workflow_steps (
id BIGINT UNSIGNED PRIMARY KEY,
job_id BIGINT UNSIGNED NOT NULL,
number INT NOT NULL,
name VARCHAR(255) NOT NULL,
status ENUM('queued','in_progress','completed') NOT NULL DEFAULT 'queued',
conclusion ENUM('success','failure','cancelled','skipped') NULL,
started_at TIMESTAMP(6) NULL,
completed_at TIMESTAMP(6) NULL,
INDEX idx_steps_job (job_id)
) ENGINE=InnoDB;25. References
GitHub engineering blog (primary sources)
- Partitioning GitHub's relational databases to handle scale.
- Building GitHub with Ruby and Rails.
- Introducing Resque.
- Introducing DGit.
- Building resilience in Spokes.
- A brief history of code search at GitHub.
- The technology behind GitHub's new code search.
- Scaling Git's garbage collection.
- GLB: GitHub's open source load balancer.
- How GitHub Actions handles CI/CD scale on short-running jobs.
Buildable substitutes
- Gitaly + Praefect.
- Zoekt.
- Actions Runner Controller (ARC).
- Kata Containers.
- Katran.
- MinIO.
- HashiCorp Vault.
Named dependencies
Practice this design: Design GitHub interview question.