System Design: Maps Platform (Navigation, Routing, and Map Rendering)
Goal: Build a mapping and navigation platform that serves 1 billion daily active users, renders map tiles for 20 billion tile requests per day, computes routes for 5 billion routing requests per day, and processes 50 million real-time location updates per second from navigators, background app users, fleet vehicles, and connected cars. Support turn-by-turn navigation, multi-modal routing (driving, walking, cycling, transit), real-time traffic overlays, offline maps, and POI search across 500 million points of interest in 200+ countries.
Reading guide: This post covers the full architecture of a Google Maps-like platform. It is long and detailed. Reading it linearly is not required.
Sections 1-8: Core architecture, data model, and API design
Section 4: Routing algorithms deep dive (Dijkstra, A*, Contraction Hierarchies) + geospatial indexing (Quadtree, Geohash, S2, H3) + tile rendering (raster vs vector) + comparison to Google's internal stack. The heart of this design.
Section 9: Implementation deep dives following the user journey: opening the map, searching, routing, ETA, navigation, then behind-the-scenes pipelines (traffic, map data)
Sections 10-14: Operational concerns (bottlenecks, failures, deployment, observability, security)
Section 15: How to build this at startup scale with open-source tools and a single server
New to maps/geo systems? Start with Sections 1-4. Section 4 walks through routing algorithms with concrete graph examples, step by step.
Building something similar? Sections 4-9 have the algorithm details, sizing math, and implementation deep dives.
Preparing for a system design interview? Sections 1-8 cover what interviewers expect. Section 4 (routing + geospatial indexing) is the most commonly asked follow-up. Sections 10-11 (bottlenecks and failures) round out the discussion.
TL;DR: S2 Cells for geospatial indexing, Contraction Hierarchies for sub-millisecond routing, vector tiles (MVT) served through a 3-tier CDN cache. Kafka ingests 50M GPS pings/sec (from navigators, background apps, fleet, and OEM vehicles), Flink aggregates real-time traffic per road segment, Redis serves hot traffic speeds to the routing engine. Road graph stored as a custom binary adjacency array with CH shortcuts, memory-mapped on routing servers. RAPTOR algorithm for transit routing. ETA prediction via heuristic baseline + ML (graph neural networks at Google-scale). The hardest problems: route computation at global scale (solved by CH preprocessing), real-time traffic integration into ETA, and tile serving at 20B requests/day (solved by aggressive caching at 95%+ CDN hit rate).
1. Problem Statement
A mapping and navigation platform answers three questions billions of times per day: "What does this area look like?" (map rendering), "How do I get from A to B?" (routing), and "What is nearby?" (search). Each question has a different computational profile and a different scaling challenge.
Four core problems make this hard:
-
Routing at scale. 5 billion route requests per day means ~58,000 routes per second. A road network is a weighted directed graph with hundreds of millions of nodes (intersections) and over a billion edges (road segments). Running Dijkstra's algorithm on the full graph takes seconds per query. At 58K queries/sec, that is computationally infeasible. The system must answer arbitrary point-to-point routes in under 200ms.
-
Map rendering. 20 billion tile requests per day across 23 zoom levels. Zoom 0 shows the entire world in a single tile. Zoom 18 shows individual buildings. The total tile space at zoom 18 alone is ~69 billion unique tiles. Generating tiles on the fly at this request rate is impossible. The system must serve tiles with < 50ms latency at the CDN edge.
-
Real-time traffic. 50 million location updates per second from all sources: active navigators (~15M/sec at 3-5 second intervals), background location sharing from app users not actively navigating (~10M/sec at 30-60 second intervals), fleet and logistics vehicles (~5M/sec), and OEM connected car telemetry (~20M/sec). Each GPS ping must be matched to the correct road segment, aggregated into road-level speed data, and reflected in ETA calculations within 30 seconds. Stale traffic data produces inaccurate ETAs that erode user trust.
-
Map freshness. The road graph and tile data are not static. Roads are built, demolished, rerouted. Buildings appear and disappear. Speed limits change. The system must continuously ingest data from three sources (satellite imagery, vehicle GPS probes, and community edits), validate it, and propagate changes to tiles and the routing graph within hours for critical updates. A map that shows a road demolished 6 months ago erodes trust as fast as a wrong ETA. The full data lifecycle runs: satellite imagery + vehicle probes + community edits -> validation -> PostgreSQL -> tile rebuild + graph reprocessing -> routing/ETA/navigation.
Scale: 1 billion DAU. 20 billion tile requests/day. 5 billion route requests/day. 50 million GPS location updates/sec. 500 million POI records. 200+ countries with road network data.
The core question: How to preprocess a road graph of hundreds of millions of nodes so that arbitrary point-to-point routes can be answered in milliseconds, not seconds, while reflecting real-time traffic conditions.
Common mistakes:
- Run Dijkstra on the full global graph for every route request (too slow by 1,000x)
- Store every GPS ping in a relational database (50M writes/sec kills any RDBMS)
- Use Geohash as the primary spatial index (boundary discontinuities cause missed results)
2. Functional Requirements
| ID | Requirement | Priority |
|---|---|---|
| FR-01 | Map tile rendering and serving (vector tiles, 23 zoom levels) | P0 |
| FR-02 | Point-to-point routing (driving, walking, cycling) | P0 |
| FR-03 | Turn-by-turn navigation with real-time voice guidance | P0 |
| FR-04 | Real-time traffic overlay on map tiles | P0 |
| FR-05 | ETA calculation with traffic-aware predictions | P0 |
| FR-06 | POI search (text search + geospatial filtering + ranking) | P0 |
| FR-07 | Geocoding (address to coordinates) and reverse geocoding | P0 |
| FR-08 | Automatic re-routing on deviation from planned route | P0 |
| FR-09 | Multi-modal routing (driving + walking + transit combined) | P1 |
| FR-10 | Offline map download (tiles + routing graph for a region) | P1 |
| FR-11 | Traffic incident detection and reporting | P1 |
| FR-12 | Speed camera and hazard alerts | P1 |
| FR-13 | Historical traffic patterns for departure time planning | P1 |
| FR-14 | Street View imagery | P2 |
| FR-15 | Indoor maps for large venues (airports, malls) | P2 |
| FR-16 | Elevation-aware routing (cycling, walking) | P2 |
3. Non-Functional Requirements
| Requirement | Target |
|---|---|
| Route computation latency (p99) | < 200ms |
| Tile serving latency (p99) | < 50ms (CDN hit), < 200ms (origin) |
| ETA accuracy | Within 10% of actual travel time (p90) |
| POI search latency (p99) | < 100ms |
| Location update ingestion | 50M updates/sec sustained (all sources: navigators, background, fleet, OEM) |
| Map data freshness (traffic) | < 30 seconds from GPS update to map overlay |
| Availability | 99.99% |
| Offline map package generation | < 5 minutes for a metro area |
| Concurrent active navigators | 50M |
| Total POI records | 500M |
| Road network nodes (global) | ~500M intersections, ~1.2B road segments |
These targets represent Google Maps-scale. See Section 15 for how the architecture simplifies at smaller scale.
4. High-Level Approach & Technology Selection
This section is the centerpiece of the design. Three major topics are covered with concrete examples: routing algorithms (Dijkstra to A* to Contraction Hierarchies), geospatial indexing (Quadtree vs Geohash vs S2 vs H3), and tile rendering (raster vs vector). The technology choices follow directly from the constraints.
4.1 The Core Routing Problem
A road network is a weighted directed graph. Nodes are intersections. Edges are road segments. Edge weights represent travel time (segment length divided by speed limit, adjusted for real-time traffic). A route from point A to point B is the shortest weighted path in this graph.
The global road network has approximately 500 million nodes and 1.2 billion edges. Finding the shortest path on a graph this size requires an algorithm that scales. Three approaches exist, each building on the limitations of the previous one.
Where does this graph come from? Three data sources feed the road graph:
- OpenStreetMap and government datasets provide the baseline road network (road geometry, classifications, turn restrictions). OSM is updated daily via its replication feed. Government datasets add authoritative speed limits and road classifications.
- Satellite imagery (Maxar at 30cm resolution, Planet Labs for daily global coverage at 3-5m) feeds ML pipelines that detect new roads, buildings, and land use changes. A new highway visible in satellite imagery becomes a candidate road segment. Building footprints extracted from imagery feed the address database used by geocoding (Section 9.2).
- Vehicle probe data (GPS traces from 50M active navigators) reveals roads in practice: where vehicles actually drive, at what speed, and how traffic flows. Probe traces that consistently follow paths not in the graph signal new roads. Historical speed distributions reveal actual speed limits vs. posted ones. Braking events reveal road surface conditions.
These three sources are fused in a weekly map data pipeline (Section 9.9) that produces the road graph and tile data consumed by routing and rendering. The pipeline resolves conflicts using a source-priority hierarchy (Section 9.9) and requires dual confirmation for new road candidates.
4.2 Approach 1: Dijkstra's Algorithm
Concrete example. Consider this 6-node graph with travel times in minutes:
A ---5--- B ---3--- C
| | |
2 7 4
| | |
D ---1--- E ---2--- F
Shortest path from A to F: A -> D -> E -> F, cost 5 minutes. Dijkstra finds this by exploring all 6 nodes, expanding in concentric circles from the source.
At city scale (50km route, 50,000-100,000 intersections), Dijkstra explores almost every node within the radius. At 58,000 route requests/sec globally, that is 5.8 billion node visits per second. Not feasible.
Time complexity: O((V + E) log V) per query.
4.3 Approach 2: A* Search
A* steers the search toward the destination using a heuristic (straight-line distance divided by max speed). This creates a cone-shaped search region instead of Dijkstra's expanding circle, exploring 3-5x fewer nodes on road networks.
The limitation: the reduction is a constant factor. A route from New York to Los Angeles still explores millions of nodes. A* alone cannot meet the < 200ms latency target for long-distance routes.
4.4 Approach 3: Contraction Hierarchies
Contraction Hierarchies (CH) is the production routing algorithm used by OSRM, Valhalla, and Google Maps. The core idea: preprocess the graph offline by removing low-importance nodes and adding "shortcut" edges that preserve shortest-path distances. Queries then run a bidirectional search that only climbs the hierarchy, exploring ~500 nodes total regardless of route length.
| Metric | Dijkstra | A* | Contraction Hierarchies |
|---|---|---|---|
| Preprocessing time | None | None | ~10 min (Germany), 4-6 hours (global 500M nodes) |
| Nodes explored per query | ~2,000,000 | ~500,000 | ~500 |
| Query time | ~2 seconds | ~400ms | < 1ms |
| Speedup over Dijkstra | 1x | 3-5x | ~2,000x |
For real-time traffic, Customizable Contraction Hierarchies (CCH) separate the graph structure (preprocessed weekly) from edge weights (updated every 30 seconds from live traffic). Shortcut weights are recalculated incrementally in ~100ms for local changes.
For the complete walkthrough with step-by-step contraction examples, bidirectional query phase, CCH vs CRP comparison, and production deployment patterns, see Contraction Hierarchies.
4.5 Geospatial Indexing: Quadtree vs Geohash vs S2 Cells vs H3
Geospatial indexing supports four features: POI search ("coffee shops near me"), traffic aggregation (aggregate GPS pings by road segment), tile generation (which features belong in this tile), and geo-fencing (alert when entering a region). The choice of spatial index affects query performance, storage layout, and how well the system partitions data across shards.
Concrete example: "Find all coffee shops within 2km of latitude 40.7128, longitude -74.0060 (downtown Manhattan)." Walk through how each indexing scheme answers this query.
Quadtree
Recursively divide the 2D plane into four quadrants. Each division creates four child cells. Points are stored in leaf nodes. To query: find the leaf containing the query point, then check neighboring leaves that overlap the 2km radius circle.
Problem 1: The 2D plane does not match Earth's surface. Near the poles, quadtree cells represent vastly different ground areas. A cell that covers 1km at the equator covers only a few meters near the poles.
Problem 2: Quadtrees are an in-memory data structure. There is no standard encoding that maps to database range scans. Persisting a quadtree requires a custom storage layer.
Best for: In-memory spatial indexes (e.g., tile selection within a viewport).
Geohash
Encode latitude and longitude into a base-32 string by interleaving bits of lat and lng. Nearby points share a common prefix. To query: compute the geohash prefix length that covers the 2km radius, then range-scan the database for all entries with that prefix.
Example: lat 40.7128, lng -74.0060 encodes to geohash dr5ru7. All points starting with dr5ru are within the same geohash cell.
Problem: Boundary discontinuities. Two points 1 meter apart can have completely different geohash prefixes if they straddle a cell boundary. The query must check the target cell plus all 8 neighboring cells. Even worse, some neighbors share no common prefix at all, requiring separate range scans for each.
Best for: Simple proximity queries in Redis or DynamoDB where only prefix-based lookups are available.
S2 Cells
Project Earth onto the six faces of a cube, then subdivide each face using a Hilbert space-filling curve. Each cell gets a 64-bit integer ID. The Hilbert curve has a critical property: spatially nearby points get numerically nearby cell IDs. A circular geographic query ("within 2km") translates to a small number of contiguous integer ranges in the database.
To query: compute a "covering" for the 2km circle. The covering is a set of S2 cells (typically 4-8) that together cover the circle. Convert each cell to an integer range. Execute range scans on the database. Union the results.
Minimal boundary artifacts. The Hilbert curve is continuous across each cube face. Adjacent cells have nearby integer IDs. S2's covering algorithm handles the six cube-face boundaries transparently, so queries never miss results the way Geohash queries can. No need to check 8 neighbors separately.
30 levels of precision: Level 0 covers a cube face (~85M km2). Level 30 covers ~1 cm2. The application chooses the level appropriate for the query radius.
Best for: Production geospatial systems at scale. Used by Google Maps, Google Spanner, Uber, and Foursquare.
H3
Project Earth onto an icosahedron tiled with hexagons. Each hexagon has exactly 6 equal-distance neighbors (unlike squares, which have 4 edge-neighbors and 4 corner-neighbors at different distances). To query: find the hex containing the query point, then ring-expand outward to cover the 2km radius.
Advantage: Uniform neighbor distance makes aggregation intuitive. Demand heat maps, coverage analysis, and spatial binning are natural with hexagons.
Disadvantage: Parent-child containment is not exact (aperture-7 subdivision means child hexes do not perfectly tile the parent). Hilbert-curve locality is weaker than S2 for database range scans.
Best for: Analytics, visualization, and aggregation (heat maps, demand zones). Used by Uber (analytics), DoorDash, and Snap.
Comparison Table
| Dimension | Quadtree | Geohash | S2 Cells | H3 |
|---|---|---|---|---|
| Cell shape | Square | Rectangle | Quadrilateral (curved) | Hexagon |
| Projection | Flat 2D | Flat 2D | Sphere (cube face) | Sphere (icosahedron) |
| Boundary artifacts | Polar distortion | Edge discontinuities | Minimal (6 cube faces, handled by covering algorithm) | Minimal (12 pentagons) |
| Database indexing | Custom | String prefix range scan | Integer range scan (fastest) | Integer range scan |
| Spatial locality | Moderate | Weak at boundaries | Strong (Hilbert curve) | Moderate |
| Neighbor traversal | Pointer chasing | Complex (8 neighbors) | Cell ID arithmetic | Simple (6 uniform) |
| Best for | In-memory spatial index | Simple proximity, Redis | Production geo at scale | Analytics, visualization |
| Used by | Legacy map renderers | Redis, Elasticsearch | Google Maps, Uber, Spanner | Uber analytics, DoorDash |
Our choice: S2 Cells for POI indexing, traffic aggregation, tile partitioning, and data sharding. S2 provides strong spatial locality for database range scans (critical at 500M POI records and 50M location updates/sec) and zero boundary artifacts. H3 is used as a secondary index for analytics dashboards (traffic heat maps, demand visualization).
For deeper coverage of spatial indexing technologies, see Google S2, H3, and PostGIS.
4.6 Map Rendering: Raster Tiles vs Vector Tiles
Map tiles are the visual representation of geographic data. The map is divided into a grid of tiles at each zoom level.
Raster tiles are pre-rendered PNG or JPEG images. Each tile is 256x256 pixels. At zoom 18 (building-level detail), the world is divided into ~69 billion tiles. Each tile must be pre-rendered for every visual style (light mode, dark mode, satellite, terrain). Storage: ~150 KB per tile. For 5 styles at zoom 0-18: hundreds of petabytes.
Advantages: No client-side rendering compute. Works on any device. Simple to serve (static files on a CDN). Disadvantages: Static (no dynamic styling, rotation, or 3D tilt without re-rendering). Massive storage. Updating a single road requires re-rendering all affected tiles across all zoom levels and all styles.
Vector tiles encode geographic data as geometric primitives (points, lines, polygons) in a compact binary format (Mapbox Vector Tiles / Protocol Buffers). The client renders them into pixels using GPU-accelerated rendering (WebGL on the web, Metal on iOS, Vulkan on Android).
Advantages: 10-50x smaller than raster tiles (~15 KB vs ~150 KB per tile). Dynamic styling (switch between light/dark mode without re-downloading). Smooth rotation and 3D tilt. Updating data only requires regenerating the vector tile, not re-rendering for every style. One set of vector tiles serves all visual themes. Disadvantages: Requires client GPU capability. More complex rendering pipeline. Older/low-power devices may struggle.
What's inside a vector tile? MVT is binary (Protocol Buffers), but here is a decoded view of a single tile covering midtown Manhattan. One tile carries multiple layers, each with geometry in tile-local integer coordinates (0-4096 grid mapped to the tile's geographic bounding box) and properties the client uses for styling.
{
"layers": [
{
"name": "road",
"extent": 4096,
"features": [
{
"geometry": { "type": "LineString", "coordinates": [[1024,2048],[2048,2048]] },
"properties": { "class": "primary", "name": "W 42nd St", "oneway": true, "lanes": 3 }
},
{
"geometry": { "type": "LineString", "coordinates": [[1536,0],[1536,4096]] },
"properties": { "class": "secondary", "name": "7th Ave", "oneway": true, "lanes": 4 }
}
]
},
{
"name": "building",
"extent": 4096,
"features": [
{
"geometry": { "type": "Polygon", "coordinates": [[[800,1800],[1200,1800],[1200,2200],[800,2200],[800,1800]]] },
"properties": { "name": "Times Square Tower", "height": 147, "levels": 47, "class": "commercial" }
}
]
},
{
"name": "water",
"extent": 4096,
"features": [
{
"geometry": { "type": "Polygon", "coordinates": [[[3800,0],[4096,0],[4096,600],[3800,600],[3800,0]]] },
"properties": { "class": "river", "name": "Hudson River" }
}
]
}
]
}How tile coordinates map to the real world. Tile (z,x,y) gives the location on Earth. Extent coordinates give the position inside that tile. The Web Mercator projection converts both to real lat/lng. Three layers of addressing work together. The tile's (z, x, y) index identifies which rectangle of the Earth this tile covers, using the Web Mercator projection (EPSG:3857). The extent grid (4096x4096) positions features within that rectangle using cheap integers instead of lat/lng floats: (0,0) is the tile's top-left corner, (4096,4096) is the bottom-right, and (2048,2048) is dead center. Integer 1024 takes 2 bytes in protobuf vs 8 bytes for a float like 40.7512, so the entire tile is 4x smaller. At zoom 15, each grid unit is ~0.07m, more than enough for roads and buildings. The client reverses the chain at render time: tile index gives the geographic bounding box, extent coordinates place features inside it, and the projection converts everything to screen pixels.
The client's style sheet decides how to render each feature: class: "primary" road gets a thick white line, class: "secondary" gets a thinner gray line, buildings with height get 3D extrusion. Same data, different visual output depending on the theme.
| Dimension | Raster Tiles | Vector Tiles |
|---|---|---|
| Tile size (avg) | ~150 KB | ~15 KB |
| Storage (all zooms, 1 style) | ~500 TB | ~50 TB |
| Dynamic styling | No (new style = new tiles) | Yes (client applies style) |
| Smooth rotation / 3D | No | Yes |
| Client compute | None | GPU rendering required |
| Update a road | Re-render all affected tiles x styles | Re-generate one vector tile |
| Bandwidth per session | ~50 MB | ~5 MB |
Storage note: most tiles at high zoom levels are empty (ocean, desert, uninhabited land). Only ~5% of zoom 18 tiles contain road or building data. Effective vector tile storage is ~50 TB, not the theoretical ~1 PB if every tile had content.
Where satellite imagery fits: Satellite tiles are always raster (pixel data cannot be vectorized into geometric primitives). The client composites three layers:
- Satellite raster tiles (bottom layer): actual imagery of terrain, buildings, vegetation from providers like Maxar (30cm) and Planet Labs (3-5m daily).
- Vector base map tiles (middle layer): roads, boundaries, water bodies as styled geometry.
- Traffic overlay tiles (top layer): color-coded road segment speeds from the real-time traffic pipeline.
In "satellite view" mode, the satellite raster layer is fully opaque with semi-transparent vector roads and labels overlaid. In "map view" mode, only vector tiles render. Satellite tile storage adds ~200 TB for global coverage across 3 imagery versions (current + 2 historical for the "historical imagery" feature).
Satellite imagery also feeds the tile generation pipeline indirectly: ML models extract building footprints and land use polygons from imagery. These features become part of the vector tile data at zoom levels 13+ (building outlines visible). The satellite processing pipeline that produces both the imagery tiles and the ML-extracted features is described in Section 9.9.
Our choice: Vector tiles as the primary format. Raster tiles are generated on demand as a fallback for legacy devices and low-bandwidth scenarios. Vector tiles reduce storage by 10x, bandwidth by 10x, and enable dynamic styling without re-generation.
4.7 Technology Selection
| Component | Technology | Why |
|---|---|---|
| Routing algorithm | Contraction Hierarchies (CCH variant) | < 1ms queries, handles real-time traffic weight updates |
| Geospatial indexing | Google S2 Cells | Hilbert curve locality, 64-bit integer range scans, no boundary artifacts |
| Tile format | Mapbox Vector Tiles (MVT) | 10-50x smaller, dynamic styling, GPU rendering |
| Tile serving | CloudFront CDN + origin tile servers | 20B requests/day, CDN absorbs 95%+ |
| Road graph storage | Custom binary (memory-mapped adjacency array) | Cache-friendly, avoids deserialization |
| Traffic ingestion | Apache Kafka | 50M updates/sec, partitioned by S2 cell, replay for recovery |
| Traffic aggregation | Apache Flink | Windowed speed computation per road segment, exactly-once |
| POI search | Elasticsearch + S2 cell routing | Combined text + geo query DSL, autocomplete |
| POI cold storage | PostgreSQL + PostGIS | Complex spatial queries, batch processing |
| Location tracking (hot) | Redis | Sub-ms reads for active navigator positions |
| Geocoding | Custom trie + Elasticsearch | Address parsing + fuzzy matching |
| Offline map packages | Amazon S3 | Pre-built region packages, CDN delivery |
| Transit data | GTFS + GTFS-Realtime | Standard format for schedules and real-time delays |
4.8 How These Choices Map to Google's Internal Stack
The algorithms match Google's published research: CH (developed at KIT, adopted by Google), S2 (created at Google, open-sourced), HMM map matching (industry standard since Newson and Krumm 2009), GTFS (co-created by Google and TriMet). The infrastructure uses open-source equivalents of Google's proprietary systems: Kafka for Pub/Sub, Flink for Dataflow, PostgreSQL for Spanner, S3 for Colossus. The conceptual architecture is identical; only the implementation layer changes.
Where scale pushes the open-source limits. Three pressure points at Google-level scale:
- PostgreSQL at 500M POIs and 100K QPS: feasible with read replicas + PgBouncer. At multi-billion records or global distribution, migrate to CockroachDB or managed Spanner.
- Elasticsearch at 500M POIs: needs 10+ data nodes with 30-50 GB per shard sizing. For hot geo-only queries, consider an S2-based lookup cache in Redis to offload Elasticsearch.
- Redis at 200M road segments: Redis Cluster with 6+ nodes handles this (~20 GB memory). If persistence matters, layer DragonflyDB or enable Redis AOF.
5. High-Level Architecture
5.1 Bird's-Eye View
The architecture is shown in two views. The Request Flow shows how user interactions are served at runtime. The Data Pipeline shows how data is ingested, processed, and stored. Components with dashed borders appear in both. They are the handoff points where the pipeline's outputs become the request flow's inputs.
Request Flow
How tile requests, routing, search, and navigation are served.
Shared components connect the two diagrams: Redis holds live speeds (Flink writes, routing reads). Elasticsearch mirrors POI data from PostgreSQL via CDC. S3 stores tiles and serves them through CDN. CH Query Servers load graph data from the weekly preprocessing pipeline.
Data Pipeline
How GPS data, map updates, and traffic intelligence flow through the system.
5.2 Component Glossary
Request Flow (1-6):
(1) Tile Request. The map renderer requests vector tiles for the visible viewport. Tile coordinates follow the standard z/x/y scheme (zoom level, column, row).
(2) CDN and Origin. CloudFront serves 95%+ of tile requests from edge cache. Cache misses go to origin tile servers, which read pre-built tiles from S3. For dynamic overlays (traffic), the CDN TTL is 30 seconds.
(3) Route Request. The navigation engine sends origin/destination coordinates and transport mode to the regional CH query server responsible for that geographic area. The server computes the route, enriches it with an ETA prediction, and returns the full route (polyline, distance, ETA, turn-by-turn guidance) back to the client.
(4) Cross-Region Routes. Routes that cross region boundaries require a bidirectional exchange between the CH query server and the transit node overlay. The overlay maintains precomputed shortest-path distances between border nodes where roads cross shard boundaries, and stitches regional path segments together. This adds ~3-5ms latency (3 CH queries: origin shard, overlay, destination shard).
(5) Traffic-Aware Routing. The CH query server reads real-time segment speeds from Redis to apply current traffic weights during the route computation. The road graph is stored as a memory-mapped binary adjacency array (no deserialization on startup). Edge weights are updated every 30 seconds via CCH (Customizable Contraction Hierarchies), which recalculates only the affected shortcut weights in < 500ms.
(6) POI Search. Text + geospatial queries go to Elasticsearch. POI data is continuously synchronized from the PostgreSQL source of truth via a Kafka CDC pipeline (< 5 min propagation, Section 9.2). Geocoding (address to coordinates) also flows through Elasticsearch and a custom address trie (Section 9.2). Complex spatial operations fall back to PostgreSQL + PostGIS.
(ETA Prediction). The ML ETA model (GNN) combines real-time segment speeds from Redis, historical speed patterns from PostgreSQL (7x96 day/time matrix), and weather forecasts to produce traffic-aware ETA predictions. The ETA prediction feeds into the CH query server, which includes it in the route response to the client (Section 9.4).
(Offline). Pre-built offline map packages (tiles + CH subgraph + POI data per region, Section 9.9) are stored in S3 and downloaded to the client's Offline Storage for use without connectivity.
Data Pipeline (7-13):
(7) GPS Updates. Active navigators send location updates every 3-5 seconds (~15M/sec). Background app users, fleet vehicles, and OEM connected cars contribute an additional ~35M/sec at lower frequencies. All updates flow into Kafka, partitioned by S2 cell at level 12 (~4,000 partitions globally).
(8) Incident Reports. User-submitted incident reports (accidents, police, hazards) flow into Kafka alongside GPS location updates. Reports are validated by proximity and corroboration, then feed the incident detector and routing engine via CCH weight overrides (Section 9.8).
(9) Map Data Acquisition. Three data sources feed the map: satellite imagery (Maxar, Planet Labs) processed through ML change detection, vehicle probe data aggregated from navigator GPS traces (road discovery, speed limits, road conditions), and OpenStreetMap community edits. The Map Data Validator merges these sources, resolves conflicts, and commits validated changes to PostgreSQL.
(10) Tile and Graph Updates. Validated map changes trigger tile rebuilds for affected z/x/y coordinates and are queued for the next weekly CH graph preprocessing. Critical changes (road closures, new highways) are fast-tracked via CCH weight updates within minutes.
(11) CH Preprocessing + Graph Deployment. Topology changes from the map data pipeline trigger full CH preprocessing weekly. The new road graph binary is deployed to routing servers via S3 with zero-downtime swap (Section 12.4).
(12) Traffic Tile Generation. Flink outputs aggregated road segment speeds. A traffic tile builder encodes speed data into a separate traffic overlay tile layer, stored in S3 with 30-second TTL on the CDN.
(13) Weather Data. Weather forecasts from a weather data provider feed the ML ETA model. Current conditions affect near-term route segments; forecast conditions affect segments the driver will reach in 30+ minutes (Section 9.4).
6. Back-of-the-Envelope Estimation
Map Tiles
Zoom levels: 23 (0 to 22)
Total tiles at zoom 18: ~69 billion
Vector tile avg size: 15 KB
Total tile storage (all zooms): ~50 TB (vector)
Tile requests/day: 20 billion
Tile requests/sec (avg): ~231K
CDN hit rate: 95%
Origin tile requests/sec: ~11.5K
Routing
Route requests/day: 5 billion
Route requests/sec (avg): ~58K
Route requests/sec (peak 3x): ~174K
CH query time: < 1ms
Routing servers (at 10K queries/sec): 6 (avg), 18 (peak) per region
Road graph in memory (with shortcuts): ~60 GB global, ~5-12 GB per region shard
Traffic
Active navigators: 50 million
Total location sources: ~200M (navigators + background + fleet + OEM)
Location updates/sec: 50 million (all sources combined)
Avg update size: 100 bytes (lat, lng, speed, heading, ts)
Ingestion bandwidth: 5 GB/sec
Road segments (global): ~1.2 billion
Segments with live traffic data: ~200 million
POI
Total POI records: 500 million
Avg POI record size: 2 KB
Total POI storage: 1 TB
POI search QPS: ~100K
Storage Summary
Vector tiles (all zooms): 50 TB
Road graph (CH preprocessed): ~60 GB per region shard set
Traffic history (30 days): ~500 TB
POI data: 1 TB
GPS location updates (7 days hot): ~3 PB
Offline map packages: ~200 TB
Street View imagery: ~500 TB
Satellite imagery (global, 3 versions): ~600 TB
Satellite processing artifacts: ~50 TB
Vehicle probe traces (90 days raw): ~200 TB
Road discovery candidates: ~100 GB
Map change detection logs: ~10 TB
7. Data Model
7.1 Road Graph (Custom Binary Format)
The road graph is stored as a cache-friendly adjacency array, memory-mapped on routing servers. No deserialization on startup; the OS pages data from disk to RAM on demand.
Node {
id: uint64 // globally unique node ID
lat: float32 // latitude
lng: float32 // longitude
s2_cell_id: uint64 // S2 cell at level 16
ch_level: uint16 // contraction hierarchy level
edge_offset: uint32 // index into edge array
edge_count: uint16 // number of outgoing edges
}
Edge {
target_node: uint64 // destination node ID
weight_ms: uint32 // travel time in milliseconds (static)
distance_m: uint32 // distance in meters
road_class: uint8 // motorway=1, trunk=2, primary=3, ...
flags: uint16 // one_way, toll, ferry, tunnel, bridge
}
Shortcut {
target_node: uint64 // destination node ID
weight_ms: uint32 // shortcut weight (sum of constituent edges)
middle_node: uint64 // for path unpacking
}
7.2 Tile Metadata (PostgreSQL)
CREATE TABLE tile_metadata (
z SMALLINT NOT NULL, -- zoom level (0-22)
x INTEGER NOT NULL, -- tile column
y INTEGER NOT NULL, -- tile row
version BIGINT NOT NULL DEFAULT 1,
size_bytes INTEGER,
etag TEXT,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (z, x, y)
);7.3 POI Schema
Elasticsearch document:
{
"poi_id": "poi-12345",
"name": "Blue Bottle Coffee",
"category": ["cafe", "coffee_shop"],
"address": "450 W 15th St, New York, NY 10011",
"location": { "lat": 40.7420, "lon": -74.0068 },
"s2_cell_l12": 9749618424903,
"rating": 4.5,
"review_count": 342,
"price_level": 2,
"hours": { "mon": "07:00-19:00", "tue": "07:00-19:00" },
"photos": ["s3://poi-media/poi-12345/1.jpg"],
"updated_at": "2026-03-09T14:30:00Z"
}PostgreSQL source of truth:
CREATE TABLE pois (
id UUID PRIMARY KEY,
name TEXT NOT NULL,
category TEXT[] NOT NULL,
location GEOMETRY(Point, 4326) NOT NULL,
s2_cell_l12 BIGINT NOT NULL,
address JSONB,
rating NUMERIC(2,1),
review_count INTEGER DEFAULT 0,
price_level SMALLINT,
hours JSONB,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_pois_s2 ON pois (s2_cell_l12);
CREATE INDEX idx_pois_geo ON pois USING GIST (location);7.4 Traffic Segment (Redis + PostgreSQL)
Redis (hot, real-time):
Key: segment:{segment_id}
Value: { "speed_kmh": 45, "free_flow_kmh": 65, "samples": 23,
"confidence": 0.92, "updated_at": 1710087600 }
TTL: 120 seconds (auto-expire stale data)
PostgreSQL (cold, historical):
CREATE TABLE traffic_history (
segment_id BIGINT NOT NULL,
timestamp TIMESTAMPTZ NOT NULL,
speed_kmh SMALLINT,
sample_count SMALLINT,
day_of_week SMALLINT, -- 0=Mon, 6=Sun
time_slot SMALLINT -- 0-95 (15-min intervals)
) PARTITION BY RANGE (timestamp);
CREATE INDEX idx_traffic_segment ON traffic_history (segment_id, day_of_week, time_slot);7.5 Satellite Imagery Metadata
CREATE TABLE satellite_imagery (
z SMALLINT NOT NULL,
x INTEGER NOT NULL,
y INTEGER NOT NULL,
version SMALLINT NOT NULL DEFAULT 1,
source TEXT NOT NULL, -- maxar, planet, airbus, aerial
acquired_at DATE NOT NULL,
cloud_pct SMALLINT,
resolution_cm SMALLINT,
s3_key TEXT NOT NULL,
PRIMARY KEY (z, x, y, version)
);7.6 Vehicle Probe Aggregates and Road Candidates
CREATE TABLE probe_aggregates (
segment_id BIGINT NOT NULL,
date DATE NOT NULL,
trace_count INTEGER NOT NULL,
p85_speed_kmh SMALLINT,
braking_events INTEGER DEFAULT 0,
source TEXT, -- navigator, fleet, oem
PRIMARY KEY (segment_id, date)
) PARTITION BY RANGE (date);
CREATE TABLE road_candidates (
id UUID PRIMARY KEY,
geometry GEOMETRY(LineString, 4326),
source TEXT NOT NULL, -- probe_cluster, satellite, manual
confidence NUMERIC(3,2),
probe_count INTEGER,
first_seen TIMESTAMPTZ,
satellite_confirmed BOOLEAN DEFAULT FALSE,
status TEXT DEFAULT 'pending', -- pending, accepted, rejected
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_road_candidates_geo ON road_candidates USING GIST (geometry);The probe_aggregates table is partitioned by date and feeds the speed limit inference and road condition scoring pipelines (Section 9.8). The road_candidates table tracks potential new roads discovered from probe clustering or satellite change detection, requiring dual confirmation before entering the road graph (Section 9.9).
8. API Design
8.1 REST Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/tiles/{z}/{x}/{y}.mvt | Get vector tile (Mapbox Vector Tile format) |
| GET | /api/tiles/{z}/{x}/{y}.png | Get raster tile fallback |
| GET | /api/tiles/traffic/{z}/{x}/{y}.mvt | Get traffic overlay tile (30s TTL) |
| GET | /api/routes | Compute route (params: origin, dest, mode, depart_time, avoid) |
| GET | /api/routes/alternatives | Compute route with 2-3 alternatives |
| GET | /api/search/places | POI search (params: q, lat, lng, radius, category) |
| GET | /api/search/autocomplete | Autocomplete (params: q, lat, lng, session_token) |
| GET | /api/geocode | Forward geocode (params: address) |
| GET | /api/reverse-geocode | Reverse geocode (params: lat, lng) |
| POST | /api/incidents | Report traffic incident (body: type, lat, lng, description) |
| POST | /api/navigation/start | Start navigation (body: route_id, returns session_id + full guidance) |
| POST | /api/navigation/{id}/end | End navigation session |
| GET | /api/offline/packages | List available offline packages by region |
| GET | /api/offline/packages/{region} | Download offline package |
8.2 Navigation WebSocket Protocol
Connection: wss://nav.example.com/session/{session_id}?token=<JWT>
| Message Type | Direction | Payload | Purpose |
|---|---|---|---|
location | Client -> Server | { lat, lng, speed, heading, ts } | GPS update every 3-5s |
guidance | Server -> Client | { maneuver, distance_m, instruction, lane_info } | Next turn instruction |
eta_update | Server -> Client | { eta_minutes, distance_km, traffic_delay_min } | Updated ETA |
reroute | Server -> Client | { new_route, reason } | Full route replacement on deviation |
incident | Server -> Client | { type, location, description } | Nearby traffic incident |
ping/pong | Bidirectional | (empty) | Heartbeat every 15s |
The search-to-route flow. Most user journeys start with a search query, not raw coordinates. The client calls /api/search/autocomplete as the user types, which returns POI results or geocoded addresses with coordinates. When the user selects a result, the client already has lat/lng coordinates and can immediately call /api/routes without a separate geocoding step. Direct address input (typed into the "destination" field) triggers /api/geocode to resolve the address to coordinates before routing. The full input-to-coordinates flow and the complete chain from search query to rendered pin (including Elasticsearch queries, tile coordinate math, and pixel positioning) are detailed in Section 9.2.
9. Deep Dives
9.1 Opening the Map: Tiles and Traffic Overlay
Tile pyramid. The world is divided into a grid of tiles at each zoom level. Zoom 0 = 1 tile (the entire world). Each zoom level quadruples the number of tiles: zoom 1 = 4, zoom 2 = 16, ..., zoom 18 = 4^18 = ~69 billion tiles. The total across all zoom levels is ~92 billion tiles.
Vector tile generation pipeline:
-
Raw geodata from satellite imagery, vehicle probes, and OpenStreetMap is merged in PostgreSQL + PostGIS via the map data pipeline (Section 9.9). A daily data fusion job commits validated changes to PostgreSQL and triggers tile invalidation for affected z/x/y coordinates.
-
Tile builder (batch job, runs on data update): for each tile coordinate (z/x/y), query PostGIS for all features intersecting that tile's bounding box. Filter features by zoom level (zoom 5 shows highways but not residential streets). Simplify geometry (reduce polygon vertex count for lower zoom levels). Clip features to tile boundary.
-
Encode into Mapbox Vector Tile format (Protocol Buffers). Each tile contains layers: roads, buildings, water, land use, labels.
-
Store in S3. Key pattern:
tiles/{z}/{x}/{y}.mvt.
3-tier caching:
| Tier | Technology | Hit Rate | Latency | Scope |
|---|---|---|---|---|
| CDN edge | CloudFront | 95% | < 20ms | Global, 400+ edge locations |
| Regional cache | Redis cluster | 4% | < 5ms | Per-region, hot tiles in memory |
| Origin | Tile server + S3 | 1% | < 200ms | Tile lookup or on-demand generation |
Tile invalidation. When road data updates, compute which tiles are affected using S2 coverings (which z/x/y tiles overlap the changed geometry). Invalidate those tiles in the CDN and regional cache. Propagate invalidation upward to lower zoom levels (a road change at zoom 18 also affects the zoom 14 tile that contains it).
Dynamic traffic overlay. Traffic tiles are a separate layer from base map tiles. The traffic tile builder reads real-time segment speeds from Redis, encodes speed-colored road segments into vector tiles, and writes them to S3 with a 30-second TTL on the CDN.
9.2 Finding a Destination: Search and Geocoding
Index architecture. Elasticsearch cluster with index routing by S2 cell at level 8. Each POI document is indexed with its S2 cell ID, enabling geographic routing of queries to the correct Elasticsearch shard.
Concrete example: "Starbucks near Times Square." Here is the full chain from keystroke to pin on the map.
Step 1: Autocomplete request. After 200-300ms of debounce, the client sends:
GET /api/search/autocomplete?q=Starbucks+near+Times+Square&lat=40.75&lng=-73.98&session_token=abc123
The lat/lng is the user's current map center, used to bias results toward the visible area. Elasticsearch uses an edge-ngram tokenizer on the name field for prefix matching, so results start appearing after just a few characters. Results are ranked by popularity within the user's S2 cell region. Target latency: < 50ms.
Step 2: Query classification. The Search API parses "Starbucks near Times Square" as a compound query: a POI name ("Starbucks") with a location context ("Times Square"). It geocodes "Times Square" first by querying Elasticsearch for a well-known landmark match, which returns coordinates (40.7580, -73.9855).
Step 3: Elasticsearch POI query. Using the geocoded center, the server computes an S2 cell covering for the search radius (default 5km) and builds an Elasticsearch query:
{
"query": {
"bool": {
"must": [{ "match": { "name": "Starbucks" } }],
"filter": [{ "geo_distance": { "distance": "2km", "location": { "lat": 40.7580, "lon": -73.9855 } } }]
}
},
"sort": [
{ "_score": "desc" },
{ "_geo_distance": { "location": { "lat": 40.7580, "lon": -73.9855 }, "order": "asc" } }
],
"size": 10
}Elasticsearch returns POI results re-ranked by composite score: 0.4 * text_relevance + 0.3 * distance_score + 0.2 * rating + 0.1 * popularity.
[
{ "name": "Starbucks Reserve", "location": { "lat": 40.7571, "lng": -73.9870 }, "rating": 4.3, "address": "1585 Broadway" },
{ "name": "Starbucks", "location": { "lat": 40.7560, "lng": -73.9888 }, "rating": 4.1, "address": "250 W 49th St" }
]Step 4: Lat/lng to tile index and screen position. The client receives POI results with lat/lng coordinates. Tiles are stored in S3 as tiles/{z}/{x}/{y}.mvt with integer indices. The Web Mercator projection converts lat/lng directly to those integers, telling the client exactly which tile file to fetch:
// Input: Starbucks at lat 40.7571, lng -73.9870, current zoom 15
// ── tile_x: which column of the tile grid ──
// Longitude ranges from -180 to +180. Adding 180 shifts it to 0-360
// so the entire world maps to positive numbers.
// Dividing by 360 normalizes to 0.0 (left edge) to 1.0 (right edge).
// Multiplying by 2^zoom converts to the tile grid at that zoom level.
// At zoom 15, there are 32,768 columns (2^15).
tile_x = floor((-73.9870 + 180) / 360 * 2^15)
tile_x = floor(106.013 / 360 * 32768) // 0.2945 → Manhattan is ~29.5% from the left edge of the world map
tile_x = 9649
// ── tile_y: which row of the tile grid ──
// Latitude is NOT linear on a Mercator map. The Mercator projection
// stretches areas near the poles so that straight lines on the map
// correspond to constant compass bearings (useful for navigation).
// The ln(tan + sec) function is the Mercator math that converts
// latitude into a Y position with this property.
// This is why Greenland looks huge on Google Maps: at 60°N, one degree
// of latitude takes 2x the vertical pixels compared to the equator.
tile_y = floor((1 - ln(tan(40.7571) + sec(40.7571)) / pi) / 2 * 2^15)
tile_y = 12315
// This Starbucks lives in tile: tiles/15/9649/12315.mvt
// If not cached, fetch: GET /tiles/15/9649/12315.mvt
// ── pixel offset: where inside the 256x256 pixel tile ──
// The tile index tells you WHICH tile. The pixel offset tells you
// WHERE INSIDE that tile to place the pin.
// The full tile_x calculation gives 9649.555 (not exactly 9649).
// Integer part (9649) = tile index. Fractional part (0.555) = how far
// across that tile the point falls. 0.555 * 256 = 142px from left edge.
pixel_offset_x = (0.555) * 256 = 142px
pixel_offset_y = (0.347) * 256 = 89px
// Result: load tile (15, 9649, 12315), place Starbucks pin at pixel (142, 89)
The lat/lng deterministically maps to a tile filename. The client does not search for "which tile has Starbucks." It computes the index, checks if that tile is already cached, and draws the pin at the pixel offset on top of the tile. For nearby tiles (to allow smooth panning around the pin), the client prefetches the 8 surrounding tiles (9648-9650, 12314-12316).
Step 5: Viewport adjustment (if needed). If search results span a larger area (e.g., "gas stations near I-95"), some POIs may fall outside the current viewport. The client computes a bounding box enclosing all results, zooms out to fit them, and fetches any new tiles needed for the expanded viewport.
Tiles vs overlays: a key distinction. The map renders in three independent layers. Base map tiles (roads, buildings, water) are fetched once and cached. Traffic overlay tiles are refetched every 30 seconds. Everything else (POI pins, route polylines, the user's location dot, navigation UI) is drawn by the client directly on the canvas, not fetched as tiles. There is no "Starbucks tile." The search returns lat/lng coordinates, the client projects them to screen pixels, and draws the pin on top of cached tiles.
The same applies to routing. When the user taps "Directions," the routing engine returns a polyline (a sequence of lat/lng points like [[40.7505,-73.9934], [40.7510,-73.9930], ..., [40.7571,-73.9870]]). The client projects each point to screen coordinates and draws a blue line connecting them. The base tiles underneath are unchanged. If the route extends beyond cached tiles, the client fetches new tiles for the expanded viewport, but the route itself is always a client-side overlay.
POI freshness. Business data changes frequently (hours, ratings, closures). A Kafka CDC pipeline streams changes from the PostgreSQL source-of-truth POI database to Elasticsearch. Propagation delay: < 5 minutes for normal updates, < 60 seconds for critical updates (closure, safety alerts).
Geocoding: From User Input to Coordinates
When a user types "Starbucks near Times Square" or "123 Main St, NYC" into the search box, the system must resolve that text into geographic coordinates (latitude, longitude) before anything else can happen. No pin on the map, no route, no ETA without coordinates. Different input types require different resolution pipelines.
Input classification. The first step is determining what kind of input the user provided:
| Input Type | Example | Resolution Pipeline |
|---|---|---|
| Coordinate pair | "40.7580, -73.9855" | Direct parse (no geocoding needed) |
| Plus Code | "87G8Q2PQ+VR" | Decode (fixed algorithm, no DB lookup) |
| Street address | "123 Main St, New York, NY" | Address geocoding (trie + address DB) |
| Place name / POI | "Starbucks" | Place resolution (Elasticsearch, Section 9.2) |
| Landmark | "Times Square" | Place resolution (well-known places DB) |
| Compound query | "Starbucks near Times Square" | Two-phase: geocode "Times Square" first, then POI search within radius |
| Ambiguous | "Springfield" | Disambiguation: rank by proximity to user, population, search history |
Address geocoding (address to coordinates):
- Parse the address into components using country-specific rules. English:
{number: "123", street: "Main St", city: "New York", state: "NY"}. Japanese:{prefecture, city, ward, block, building}(no street names). German:{street} {number}, {postal} {city}. - Normalize tokens: expand abbreviations (St to Street, Ave to Avenue, Str to Strasse), strip diacritics, standardize casing.
- Hierarchical lookup in the address trie: country to state/province to city to street to number range. The trie is stored in memory (~20 GB for global address data). Each leaf node maps to a coordinate pair or an address range with interpolation.
- Address interpolation. Not every building number exists as an explicit point in the database. If the database has points for #100 (at coordinates A) and #200 (at coordinates B) on Main St, then #150 is estimated at the midpoint of the road segment between A and B. Interpolation accuracy: 20-30m in dense urban areas, 50-100m+ in rural areas.
- Rank candidates by: match completeness (how many address components matched), proximity to user's current location, population/importance of the area.
- Return the top result's coordinates plus a confidence score and structured address.
Place resolution (place name / POI to coordinates):
- The query goes to Elasticsearch (same pipeline as POI search, Section 9.2).
- Match against POI name, category, and aliases using the edge-ngram tokenizer.
- If the query includes a location context ("near Times Square"), first geocode the context location, then use it as the center for a geo_distance filter.
- If no location context is provided, use the user's current GPS coordinates as the center.
- Return the top match's coordinates. For chains (Starbucks), return the nearest location.
How the address database is built. The address database that powers geocoding comes from three sources, connecting back to the satellite/probe data pipeline (Section 9.9):
- Government address registries: Official postal address databases (USPS, Royal Mail, etc.) provide authoritative address points for countries that maintain them.
- OpenStreetMap: Community-contributed address nodes and building polygons with address tags.
- Satellite-derived building footprints: Where government and OSM data is sparse (developing countries, new construction), ML-extracted building footprints from satellite imagery provide candidate address points. Each detected building is assigned an interpolated address based on its position along the nearest named road segment.
Geocoding accuracy in remote areas improves as satellite imagery coverage improves. A region with sparse OSM coverage but recent high-resolution satellite imagery can still resolve addresses to within 50-150m depending on building density and satellite resolution.
Reverse geocoding (coordinates to address):
- Compute S2 cell at level 22 (sub-meter precision) for the input coordinates.
- Query the address database for the nearest address point within that cell and its neighbors.
- If no exact address point exists (rural area, new development), interpolate along the nearest street segment. If address 100 is at point A and 200 is at point B, estimate address 150 as the midpoint.
- Return the structured address with building number, street, city, postal code, country.
Challenges:
- Japan uses block-based addressing (no street names). The trie structure uses prefecture to city to ward to block to building instead.
- Rural areas in developing countries may have no formal addresses. Fallback: return the nearest named settlement or road. Satellite-derived building footprints partially close this gap.
- Plus Codes (Open Location Codes) provide coordinate-based addressing for regions without street addresses. The system decodes them directly to lat/lng.
- Ambiguous inputs ("Springfield" matches 30+ cities in the US alone). Disambiguation uses: (1) proximity to user, (2) population, (3) search history, (4) country-level context from the user's locale.
9.3 Computing the Route: Routing Engine
Graph representation. The road graph is stored as an adjacency array: a flat array of nodes and a flat array of edges. Node i's outgoing edges are at edges[node[i].edge_offset .. node[i].edge_offset + node[i].edge_count]. This is cache-friendly because sequential edge scans hit contiguous memory. The entire structure is memory-mapped from a binary file. No deserialization step on server startup; the OS pages data from disk as accessed.
Regional sharding. The global graph (~500M nodes) is split into ~50 region shards, each covering a geographic area (e.g., Western Europe, Eastern US, Southeast Asia). Each routing server loads one or more region shards (~5-12 GB per shard).
Cross-region routing. Routes that cross shard boundaries are handled by a transit node overlay. At preprocessing time, the system identifies "border nodes" where roads cross region boundaries. An overlay graph connects all border nodes with precomputed shortest-path distances. Cross-region routing: (1) route from origin to the nearest border node within the origin shard, (2) route through the overlay graph, (3) route from the border node to the destination within the destination shard. Total query time: ~3-5ms (3 CH queries).
CCH weight updates. Every 30 seconds, the routing server reads updated segment speeds from Redis and recalculates affected shortcut weights (< 500ms for 10,000 changed segments). See Contraction Hierarchies for the full CCH mechanics. The weekly graph rebuild pipeline is described in Section 12.4.
Alternative routes. The penalty method: (1) compute the optimal route, (2) penalize all edges on that route (multiply weights by 2x), (3) recompute. The second route avoids the first route's path. (4) Optionally repeat for a third alternative. Each alternative query takes < 1ms with CH.
9.4 Predicting Arrival: ETA Calculation
Baseline ETA. The CH route returns a sequence of road segments with travel times based on speed limits. Sum the segment travel times. This is the free-flow ETA, accurate at 3 AM but off by 2-5x during rush hour.
Traffic-aware ETA. Replace speed-limit weights with real-time speeds from the traffic pipeline (Redis). For segments without real-time data (low-traffic rural roads, or segments with < 3 samples), fall back to the speed limit. This handles current conditions but not future conditions along the route.
Historical patterns. For each road segment, maintain a 7x96 matrix: 7 days of the week, 96 time slots per day (15-minute intervals). Each cell stores the historical median speed. Built from the traffic history table in PostgreSQL (aggregated nightly). This captures recurring patterns: Monday morning rush on I-95, Saturday afternoon congestion near shopping centers.
Predictive ETA. For a 2-hour drive, conditions will change during the trip. The predictive ETA algorithm:
- Compute the route using current traffic weights.
- For each segment S along the route, estimate the arrival time at S based on cumulative travel time.
- Look up the historical speed for S at that predicted arrival time's day-of-week and time-slot.
- Blend real-time and historical: if the segment has real-time data with high confidence, weight it 80%. Otherwise, weight historical 80%.
- Recompute cumulative travel times with the blended speeds.
- Iterate once more (the changed travel times shift arrival times, which may change the historical lookup).
Two iterations are sufficient for convergence in practice.
Data-source-aware adjustments. Beyond real-time and historical traffic, ETA incorporates signals from the broader data pipeline (Section 9.9):
- Construction zones (detected from probe speed anomalies or satellite change detection): multiply segment weight by 1.5-3x depending on severity.
- Road condition (from probe braking events, Section 9.8): add 5-10% to travel time for segments with poor condition scores in comfort-optimized routing.
- Unverified road segments (low probe count, old satellite imagery): widen the ETA confidence interval. Display "25-35 min" instead of "28 min" when routing through low-confidence segments. This is especially common in developing regions where road data comes primarily from satellite extraction rather than probe confirmation.
Confidence interval. The historical data also stores speed variance. High variance (school zone: 50 km/h at 2 PM, 15 km/h at 3 PM school pickup) produces a wide ETA range. The client displays "35-45 min" when variance is high and "38 min" when variance is low.
ML-based ETA (production approach). Every major mapping platform now uses ML models for ETA. Google's GNN (published 2021 in Nature, developed with DeepMind) reduced errors by 40-50% over heuristic blending. The key insight: a GNN learns non-obvious congestion interactions across the road graph topology, not just the route's segments. Congestion on a highway exit ramp affects surface streets two blocks away, something the heuristic blend misses.
Uber's DeepETA uses linear transformers with geospatial embeddings. The architecture that works at scale: train a GNN or transformer offline on historical trip data, serve predictions at query time with < 5ms latency using a feature store combining real-time traffic, historical patterns, and contextual signals. For teams without ML infrastructure, the heuristic blend above is a solid starting point.
Weather integration. Rain reduces average speeds by 10-15%, snow by 20-40%, fog by 15-25%. The ETA model incorporates weather forecasts from a weather data provider (OpenWeather, Tomorrow.io) as a feature. Current conditions affect near-term segments; forecast conditions affect segments the driver will reach in 30+ minutes. Weather is especially impactful for predictive ETA on long routes.
9.5 Navigating: Turn-by-Turn Guidance
Guidance generation. The route is a sequence of road segments from the CH query. At each transition between segments, the system generates a maneuver instruction.
- Compute the bearing angle of the current segment's last 50m and the next segment's first 50m.
- Calculate the turn angle:
angle = next_bearing - current_bearing. - Classify: straight (within 15 degrees), slight left/right (15-45), turn left/right (45-135), sharp left/right (135-170), U-turn (170-180).
- Attach street name, distance to next maneuver, and lane info.
Voice guidance trigger distances. Instructions are announced at speed-dependent distances:
| Speed | First announcement | Reminder | At turn |
|---|---|---|---|
| < 40 km/h (city) | 200m | 100m | 0m |
| 40-80 km/h (suburban) | 500m | 200m | 0m |
| > 80 km/h (highway) | 1,000m | 500m | 0m |
Off-route detection. Every 3-5 seconds, the navigator sends a GPS update. The server map-matches the position to the route:
- Compute the distance from the GPS point to the nearest point on the expected route.
- If the distance exceeds 50m for 3 consecutive updates (~15 seconds of deviation), trigger re-routing.
- Re-route uses the current position as origin and the original destination as target. The CH query takes < 1ms. Guidance is regenerated and sent via WebSocket within 500ms of detection.
Lane guidance. At complex interchanges, the system overlays lane arrows. Lane data is precomputed from road network attributes (lane count, turn arrows per lane). The guidance message includes lane_info: { total_lanes: 4, recommended: [2, 3], arrows: ["left", "straight", "straight", "right"] }.
Lane-level navigation (emerging). Lane-level map data is becoming production-ready as of 2026 (TomTom Orbis, Mapbox 3D lanes, Google Maps live lane guidance). Lane data comes from HD LiDAR surveys, satellite imagery, and probe lateral position clustering. The road graph gains a lane-level sub-graph where each edge is a lane rather than a road segment. Mapbox reports a 23% reduction in missed exits with accurate lane guidance.
9.6 Multi-Modal Routing
The problem. A trip might combine: walk 5 min to subway -> ride 20 min -> transfer -> ride 10 min -> walk 3 min to destination. The routing engine must combine walking graph, transit schedule graph, and potentially driving/cycling graphs into a single route computation.
GTFS data model. Public transit schedules are published in GTFS (General Transit Feed Specification) format: stops, routes, trips, stop_times, and calendar. Each stop_time entry says "trip T123 arrives at stop S456 at 08:15:00 and departs at 08:16:00." This defines a time-expanded graph where each stop-time is a node.
RAPTOR algorithm. Round-Based Public Transit Routing Algorithm:
- Round 0: Mark the source stop as reached at departure time. Mark all stops reachable by walking from the origin (using the walking graph, up to 15 min walk).
- Round 1: For each reached stop, scan all trips departing from that stop after the arrival time. Follow each trip to subsequent stops, recording the earliest arrival time. After scanning all trips, add transfer connections (walking between nearby stops).
- Round N: Repeat. Each round adds one transit trip to the journey.
- Termination: Stop when no new stops are reached or a maximum number of rounds (typically 5-6 transfers).
RAPTOR is much faster than Dijkstra on transit networks because it exploits the structure of schedules (trips visit stops in sequence, so scanning a trip's stop_times is a simple array traversal).
Real-time delays. GTFS-Realtime feeds provide live delay data. When a train is 5 minutes late, the system updates the departure times in the trip's stop_times. RAPTOR queries use the adjusted times, so suggested connections account for current delays.
Combined output. The response includes a sequence of legs: [{ mode: "walk", ... }, { mode: "subway", line: "A", ... }, { mode: "walk", ... }] with total duration and transfer details.
EV routing. Electric vehicle routing adds constraints that do not exist in combustion vehicle routing: battery state-of-charge modeling, elevation-aware energy consumption (uphill drains battery, downhill regenerates), charging station availability and connector type compatibility, and range-constrained route optimization. The routing engine must find a path that reaches the destination without the battery dropping to zero, inserting charging stops as needed. Edge weights change from travel-time-only to a (travel_time, energy_cost) pair. The CH preprocessing handles this as a second metric via CCH. Google Maps, Apple Maps, and Tesla all support EV-aware routing as of 2026. The key data inputs: vehicle-specific energy consumption models (provided by OEM APIs or learned from probe data), real-time charging station availability (via OCPP feeds from charging networks), and elevation data from digital elevation models (SRTM, ASTER).
9.7 Offline Maps
Package structure. An offline map package for a region (e.g., "New York metro area") contains:
- Vector tiles for zoom levels 0-16 within the region bounding box (~100-300 MB)
- CH-preprocessed road graph subgraph for the region (~50-200 MB)
- POI data: name, category, coordinates for all POIs in the region (~20-50 MB)
- Address database for geocoding within the region (~30-80 MB)
- Total: ~200-600 MB per metro area, ~1-2 GB per country
Package generation. Nightly batch job per region:
- Extract the road subgraph from the global graph (all nodes and edges within the region bounding box plus a 10km buffer for routes that leave and re-enter the region).
- Run CH preprocessing on the subgraph. This takes 30 seconds to 5 minutes depending on region size.
- Extract vector tiles from S3 for the region's z/x/y tile coordinates.
- Extract POI data from PostgreSQL for the region.
- Package into a compressed archive (zstd compression, ~60% compression ratio).
- Upload to S3. CDN distributes downloads globally.
Delta updates. Instead of re-downloading the full package on every update, the system computes a binary diff (bsdiff algorithm) between the old and new package. Delta updates are typically 5-15% of the full package size. The client applies the delta locally.
On-device routing. The same CH bidirectional query algorithm runs locally on the device using the downloaded subgraph. Route quality is identical to online routing within the region. Routes that leave the offline region's coverage area are not supported; the client shows a prompt to go online or download an adjacent region.
Storage management. The mobile client manages offline packages with LRU eviction. Frequently visited regions (home, work, vacation spot) are pinned. Rarely used regions are evicted when device storage is low. The client tracks last-access time per package and evicts the oldest-accessed first.
9.8 Real-Time Traffic Pipeline
50M GPS pings/sec → Kafka → Flink (map match + aggregate) → Redis (live speeds) → routing engine + traffic tiles.
Ingestion. 50 million GPS pings/sec flow into Kafka from multiple sources: active navigators (~15M/sec at 3-5 second intervals), background app users (~10M/sec), fleet vehicles (~5M/sec), and OEM connected car telemetry (~20M/sec). Each ping: { user_id_hash, lat, lng, speed_kmh, heading_degrees, timestamp, source_type }. Kafka is partitioned by S2 cell at level 12 (~4,000 partitions globally). This ensures all pings from the same geographic area land on the same partition, simplifying downstream aggregation.
Map matching. Raw GPS coordinates have 5-10m accuracy. A ping near a highway overpass could belong to the highway or the road below it. Map matching determines which road segment the vehicle is actually on.
The standard approach: Hidden Markov Model (HMM).
- Emission probability: How likely is this GPS point given the vehicle is on road segment S? Based on the distance from the GPS point to the segment (Gaussian distribution, sigma = GPS accuracy).
- Transition probability: How likely is the vehicle to move from segment S1 to segment S2 between consecutive pings? Based on the routing distance between the segments vs. the straight-line distance between the GPS points (similar values = plausible transition).
- Viterbi algorithm: Find the most likely sequence of road segments given the sequence of GPS observations. This handles ambiguous cases (overpasses, parallel roads) by considering the full trajectory.
Performance optimization: HMM map matching costs ~0.5ms per GPS point. At 50M points/sec, that requires 25,000 CPU cores. Optimization: use lightweight geometric snapping (snap to nearest road within 10m) for straight-road segments where there is no ambiguity. Reserve HMM for complex intersections, overpasses, and highway interchanges. This reduces HMM calls by ~80%.
Speed aggregation. Flink processes map-matched GPS points with a 30-second tumbling window per road segment:
- Collect all matched speed observations for each segment in the window.
- Compute median speed (robust to outliers from GPS noise or stopped vehicles at traffic lights).
- Compute sample count and confidence score (more samples = higher confidence).
- Write to Redis:
segment:{id} -> { speed_kmh, samples, confidence, updated_at }. - Publish segment speed updates to the traffic tile builder.
Incident detection. If the median speed on a segment drops below 20% of its free-flow speed for 2+ consecutive windows (60 seconds) with 5+ samples, an incident is flagged. Incident data is published to the navigation service for real-time alerts to affected navigators.
Crowdsourced incident reporting. Beyond automated detection from probe speed anomalies, user-submitted reports (accidents, police presence, road hazards, closures) provide a second signal layer. This is the Waze model: a driver taps "report" in the app, the report is geotagged and categorized, and nearby navigators receive alerts. Reports are validated by proximity (is the reporter on the segment?) and corroboration (multiple independent reports within 5 minutes increase confidence). Confirmed reports feed back into the routing engine as temporary edge weight penalties or blockages via CCH weight overrides, and into the traffic tile overlay as incident markers.
Vehicle GPS traces also power batch intelligence beyond real-time traffic: road discovery, speed limit inference, road condition scoring, and construction zone detection. These batch processes are described in the vehicle probe processing pipeline (Section 9.9).
9.9 Map Data Pipeline: From Raw Sources to Tiles and Routes
Three data sources (satellite, vehicle probes, OSM) are fused daily into tiles and the road graph.
This section describes the end-to-end pipeline that turns raw data sources into the tiles and road graph consumed by the rest of the system. Every section above references data that flows through this pipeline: the road graph (Section 4.1, 9.3), vector tiles (Section 9.1), satellite imagery tiles (Section 4.6), geocoding address database (Section 9.2), and probe-derived intelligence (Section 9.8).
Three data sources:
- Satellite imagery (Maxar 30cm for detail, Planet Labs 3-5m for daily global coverage, Sentinel-2 for free 10m coverage)
- Vehicle probe data (50M navigators contributing GPS traces, speed, and sensor data)
- Community and government data (OSM edits, government road databases, GTFS transit feeds)
Satellite imagery processing pipeline:
- Raw images arrive from vendor APIs (Maxar delivers ortho-ready tiles weekly, Planet delivers daily).
- Standard photogrammetric processing (radiometric correction, orthorectification, pansharpening, mosaic stitching) produces geometrically accurate tiles aligned to the S2 grid.
- ML feature extraction runs on the processed imagery:
- Road detection (U-Net segmentation): identifies road centerlines. Compared against existing graph. New roads flagged as candidates in
road_candidates(Section 7.6). - Building footprint extraction (Mask R-CNN): detects building outlines for rendering at zoom 15+ and for geocoding address database (Section 9.2).
- Change detection: pixel-level comparison of current vs. previous imagery. Significant changes (new construction, demolished structures, flooding) flagged for review.
- Road detection (U-Net segmentation): identifies road centerlines. Compared against existing graph. New roads flagged as candidates in
- Processed imagery is cut into z/x/y raster tiles for the satellite view layer (Section 4.6). Metadata stored in
satellite_imagery(Section 7.5).
Vehicle probe processing pipeline:
- Raw GPS traces (already ingested for real-time traffic in Section 9.8) are also archived to the probe data warehouse (S3 + Parquet, 90-day retention).
- Nightly batch job: cluster traces by road segment. Compute per-segment metrics stored in
probe_aggregates(Section 7.6): daily trace count, p85 speed, braking event count, lateral spread (lane count estimate). - Road discovery: cluster off-graph traces (traces > 20m from any existing road segment) by spatial proximity. Clusters with 10+ distinct users over 30 days become initial road candidates (requiring satellite confirmation for promotion, or 30+ users to be promoted without satellite).
- Speed limit inference: compare p85 free-flow speed against mapped speed limit per segment. Divergences flagged for review and corrected in the next graph rebuild.
- Road condition scoring: segments with > 5 braking events per 1,000 traces receive a condition penalty applied to routing weights.
Data fusion and validation (daily batch):
- Merge satellite-detected changes, probe-derived updates, and OSM community edits.
- Conflict resolution: probe data wins on road existence (vehicles drove there). Satellite wins on building geometry (higher spatial resolution). OSM wins on metadata (road names, turn restrictions).
- Road candidates require dual confirmation: probe clustering + satellite imagery, or probe clustering + 30+ distinct users.
- Validated changes commit to PostgreSQL (the source of truth for tiles and the road graph).
Downstream triggers:
- Tile invalidation: compute which z/x/y tiles are affected by changes using S2 coverings. Invalidate CDN and regional cache. Tile builder (Section 9.1) regenerates affected tiles from updated PostGIS data.
- Graph rebuild queue: topology changes (new intersections, deleted roads) are queued for the weekly CH preprocessing (Section 12.4). Weight-only changes (speed limits, road conditions) apply immediately via CCH weight updates (Section 9.3).
- Address database updates: building footprint changes from satellite imagery update the geocoding address database (Section 9.2). New buildings get interpolated addresses based on position along the nearest road.
9.10 Full Journey: Sequence Diagram
The following sequence diagram shows the complete user journey from opening the app to active navigation, illustrating how all subsystems interact.
10. Identify Bottlenecks
10.1 Tile CDN Cache Misses at High Zoom Levels
Symptoms: At zoom 18+, the tile space is enormous (~69 billion unique tiles). CDN hit rate drops below 50% for rural or rarely viewed areas. Origin tile servers see request spikes. Tile latency increases from < 20ms (CDN) to 200ms+ (origin).
Mitigation: Pre-warm CDN caches for urban areas (top 500 cities cover 90% of tile traffic). Generate high-zoom tiles lazily for rural areas (generate on first request, cache for 7 days). Use a regional Redis cache tier to absorb repeat requests that miss CDN but hit the same region.
10.2 Routing Server Memory for Global Graph
Symptoms: Each routing server loading the full global CH graph needs ~60 GB of RAM. With 50+ servers per region for redundancy and throughput, that is 3+ TB of duplicated graph data per region.
Mitigation: Regional sharding. Each server loads only its assigned region shard (~5-12 GB). Cross-region routes handled by the transit node overlay (adds ~3ms per region boundary crossing). Memory-mapped files mean the OS only pages in the actively queried portions of the graph; cold regions stay on disk.
10.3 Map Matching Compute at 50M Updates/Sec
Symptoms: HMM-based map matching costs ~0.5ms per GPS point. At 50M/sec, pure HMM requires 25,000 CPU cores. Flink task managers hit CPU saturation.
Mitigation: Geometric fast path (Section 9.8). Snap-to-nearest-road for unambiguous straight segments; reserve HMM for intersections and overpasses. Reduces HMM calls by ~80%, cutting CPU requirements to ~5,000 cores.
10.4 Traffic Aggregation Hotspots
Symptoms: A 6-lane highway during rush hour receives thousands of GPS pings per 30-second window. The Flink aggregator for that segment handles disproportionate load compared to a rural road with 1-2 pings per window.
Mitigation: Pre-aggregate by S2 cell in Kafka Streams (lightweight count + sum per segment per cell) before Flink. The Flink aggregator receives pre-aggregated partial results, reducing the number of individual records it processes. Spread hot segments across multiple Flink task slots using a secondary hash key.
10.5 POI Search in Dense Urban Areas
Symptoms: Manhattan has 100,000+ POIs within a 2km radius. An Elasticsearch query must filter, score, and rank all of them in < 100ms. With complex re-ranking (text relevance + distance + rating + popularity), the query can exceed 200ms.
Mitigation: Pre-compute top-N POIs per category per S2 cell at level 14 (~1.2 km2 per cell). Store in Redis. Serve these directly for simple "coffee near me" queries without hitting Elasticsearch. Reserve Elasticsearch for complex queries (multi-category, custom radius, full-text search).
10.6 Re-Routing Stampede
Symptoms: A sudden highway incident triggers 50,000+ active navigators on that route to request re-routes within seconds. The routing servers see a 10x spike in query rate.
Mitigation: When an incident is detected, the routing engine proactively computes the optimal detour route for the affected corridor. This pre-computed alternative is cached and served to all affected navigators without individual CH queries. Stagger the re-route push with random jitter (0-5s) to spread the WebSocket message burst.
10.7 Offline Package Generation at Scale
Symptoms: 200+ countries, 1,000+ metro areas, nightly regeneration. Each package requires graph extraction + CH preprocessing + tile extraction. Sequential processing takes hours.
Mitigation: Parallel batch processing on a Flink/Spark cluster. Only rebuild packages for regions with data changes (road network updates, new POIs). Skip unchanged regions. Incremental CH preprocessing (only re-contract nodes in the changed subgraph) reduces preprocessing time by 90% for small updates.
10.8 Cross-Region Route Stitching Latency
Symptoms: A route from Paris to Berlin crosses 3 region boundaries. Each regional CH query takes < 1ms, but the transit node overlay query adds network latency if the overlay service is in a different data center. Total route time increases to 10-20ms.
Mitigation: Co-locate the transit node overlay service in each region's data center. Pre-compute cross-region shortest paths for the top 10,000 most popular origin-destination region pairs. Serve these from a cache, bypassing the overlay computation entirely.
10.9 Satellite Image Processing Backlog
Symptoms: New satellite imagery arrives faster than the processing pipeline can handle (especially after a bulk delivery from Planet Labs covering a full continent). Tiles show outdated imagery weeks after newer captures are available. ML change detection falls behind, delaying road discovery and building footprint updates for geocoding.
Mitigation: Prioritize processing by population density (urban areas first, rural areas can wait). Skip tiles with > 50% cloud cover (wait for a clearer capture). Parallelize processing across an auto-scaling Spark cluster. Track processing lag per region as an SLI with a 7-day SLO for urban and 30-day for rural. During backlog events, the rest of the system continues operating normally: existing tiles, road graph, and address data remain valid. Only the freshness of satellite-derived features degrades.
11. Failure Scenarios
11.1 CDN Edge Node Failure
Impact: Tile requests for the affected edge location route to the next-nearest edge node. Latency increases by 20-50ms for affected users. No data loss.
Recovery: CDN provider's automatic failover detects the unhealthy edge within 10 seconds and redirects traffic. Users experience a brief latency bump during failover. No manual intervention needed.
11.2 Routing Server Crash
Impact: In-flight route requests on the crashed server fail (timeout). The load balancer's health check fails within 10 seconds.
Recovery:
- Load balancer removes the unhealthy instance from the pool.
- Remaining routing servers absorb the traffic. At 10K queries/sec per server with 18 servers, losing 1 server increases load on the remaining 17 by ~6%. Well within capacity.
- Auto-scaler launches a replacement instance. The new server memory-maps the regional graph binary from shared storage (EBS or NFS). Time to ready: ~60 seconds (memory-mapping is lazy; the server starts serving immediately and pages in data on demand).
- Zero data loss. Routes are stateless; the client retries the failed request.
11.3 Kafka Broker Failure (Traffic Pipeline)
Impact: Partitions hosted on the failed broker are temporarily unavailable. Location updates for those partitions queue at the producer. Traffic data for affected S2 cells goes stale (no new speed updates for 15-30 seconds).
Recovery:
- Kafka leader election promotes a replica for each affected partition (10-15 seconds with RF=3).
- Producers flush queued updates to the new leader.
- Flink consumers resume from their last committed offset. Exactly-once semantics preserved via Kafka transactions.
- Traffic data freshness returns to normal within 30 seconds of broker recovery.
Impact on routing: During the 15-30 second gap, the routing engine uses the last known traffic speeds (still cached in Redis with 120s TTL). If the gap exceeds TTL, the engine falls back to historical speed patterns. ETA accuracy degrades slightly for affected road segments.
11.4 Flink Checkpoint Failure
Impact: Flink job restarts from the last successful checkpoint. Some GPS data between the last checkpoint and the failure is reprocessed (idempotent writes to Redis prevent double-counting of speed samples, but segment speed updates may be delayed by the checkpoint interval, typically 30-60 seconds).
Recovery: Flink automatically restores state from the checkpoint stored in S3. Job resumes processing from the last committed Kafka offset. No manual intervention. Traffic data freshness gap equals the checkpoint interval.
11.5 Redis Failure (Real-Time Traffic Cache)
Impact: The routing engine cannot read real-time traffic speeds. Navigation sessions lose live ETA updates. Traffic overlay tiles go stale.
Recovery:
- Redis Sentinel detects the failure and promotes a replica within 10-15 seconds.
- During the gap, the routing engine falls back to historical traffic patterns for ETA calculation. Route quality degrades to "historical accuracy" (within 15% of actual travel time instead of 10%).
- Flink re-populates the new Redis primary with current segment speeds within 30 seconds.
- Navigation sessions resume live ETA updates automatically.
11.6 Elasticsearch Cluster Degradation
Impact: POI search latency increases from < 100ms to 500ms+. Autocomplete becomes sluggish. Some queries time out.
Recovery:
- Elasticsearch self-heals: failed nodes are replaced, shards rebalance.
- During degradation, the POI search service falls back to a pre-computed cache of popular queries (top 100 categories x top 5,000 S2 cells = 500K cached result sets in Redis). Cache coverage handles ~70% of POI queries.
- Uncached queries are retried with a simplified filter (larger S2 cell, fewer scoring factors) to reduce Elasticsearch load.
11.7 GPS Data Quality Degradation
Impact: In tunnels, urban canyons (tall buildings), and parking garages, GPS accuracy drops from 5m to 50-100m. Map matching confidence drops. Traffic speed data becomes noisy (wrong road segment assignment).
Recovery:
- The map matching algorithm discards GPS points with accuracy > 30m (reported by the device's location API).
- For segments with < 3 valid samples in a 30-second window, the traffic pipeline discards the aggregation result and retains the previous window's speed.
- The navigation service detects poor GPS quality (high HDOP or accuracy reported by the device) and switches to dead reckoning: estimate position from last known position + speed + heading. Resume GPS-based positioning when signal quality improves.
- Long tunnel routing uses pre-mapped tunnel geometry to interpolate position.
11.8 Full Region Failure
Impact: All services in an entire region (e.g., US-East) are unavailable. Tile serving, routing, traffic, POI search, and navigation sessions for users routed to that region all fail.
Recovery:
- DNS failover redirects traffic to the adjacent region (US-West) within 30-60 seconds.
- Tile CDN continues serving cached tiles globally (CDN edge nodes are independent of origin region).
- Routing servers in the adjacent region can serve routes for the failed region's area by loading the relevant graph shard from S3 (lazy load, ~60 seconds to page in data).
- Traffic data for the failed region goes stale. The adjacent region's routing engine falls back to historical patterns.
- Active navigation sessions disconnect and reconnect to the new region. The client resumes from its last known GPS position.
Time to partial recovery: 30-60 seconds (DNS + CDN). Time to full recovery: 5-10 minutes (graph shard loading in adjacent region).
11.9 Satellite Vendor Data Feed Interruption
Impact: No new satellite imagery from the primary vendor (Maxar). Change detection pipeline stalls. Road discovery from satellite stops. Satellite tile layer shows increasingly stale imagery. Building footprint updates for the geocoding address database pause.
Recovery:
- Fall back to secondary vendor (Planet Labs: lower resolution but higher frequency). Sentinel-2 (free, 10m) provides coarse change detection as a third fallback.
- Existing satellite tiles remain valid and cached. Staleness increases but CDN continues serving.
- Vehicle probe data compensates partially: road discovery continues via probe clustering alone (lower confidence without satellite confirmation, requiring 30+ users instead of 10+).
- ML change detection pauses gracefully and resumes when imagery returns. The nightly data fusion job (Section 9.9) continues processing probe and OSM data normally.
- Notify affected tile regions in the observability dashboard. No user-facing impact unless the outage exceeds 60 days for a region (at which point building footprints and land use data become noticeably stale).
12. Deployment Strategy
12.1 Multi-Region Active-Active
Deploy in 3+ regions for global coverage. Each region is self-sufficient: has its own routing servers, traffic pipeline, POI search, and tile origin servers. CDN is global.
US-East: 6 routing servers, 50 Flink task managers, 3-node ES cluster
US-West: 6 routing servers, 40 Flink task managers, 3-node ES cluster
EU-West: 6 routing servers, 40 Flink task managers, 3-node ES cluster
AP-Southeast: 4 routing servers, 30 Flink task managers, 3-node ES cluster
12.2 Rolling Upgrades
- Routing servers: Drain connections (stop accepting new route requests, finish in-flight requests within 5s timeout). Deploy new binary. The new server memory-maps the same graph file. No downtime per-server.
- Flink jobs: Savepoint-based upgrades. Trigger a savepoint, stop the old job, start the new job from the savepoint. Traffic data gap equals savepoint duration (~10 seconds).
- Tile servers and CDN: Blue-green deployment. Deploy new tile server version behind a new target group. Switch the load balancer target once health checks pass.
12.3 Canary Deployment
New routing engine versions are deployed to 5% of traffic (one routing server out of 18) for 30 minutes. Monitor: route latency p99, error rate, ETA accuracy delta. Automated rollback if error rate exceeds baseline by 2x or latency p99 exceeds 200ms.
12.4 Graph Data Deployment
The weekly preprocessing pipeline (Sunday 02:00 UTC) connects the map data pipeline (Section 9.9) to the routing engine (Section 9.3):
- Data collection: Pull validated changes from PostgreSQL: satellite-detected new roads, probe-confirmed road candidates, OSM edits, government data updates, probe-inferred speed limits.
- Topology diff: Compare against the previous graph version. Identify new nodes, deleted edges, reclassified segments.
- CH preprocessing: Recompute node importance (incorporating probe traffic volume as a factor), run contraction, generate shortcut edges. 4-6 hours on a 128-core machine.
- Validation: Sanity check: compute shortest paths for 10,000 known origin-destination pairs. Flag if any path length increases > 20% vs. previous version (indicates data regression).
- Canary deployment: Deploy new graph to 5% of routing servers. Monitor route quality for 30 minutes.
- Full rollout: Deploy to all servers. Zero-downtime swap via memory-mapped file pointer. Old binary retained 24 hours for rollback.
Emergency updates (< 5-minute latency): Road closures from incident detection (probe data) or construction (satellite change detection) are applied as CCH weight overrides without full reprocessing (Section 4.4).
13. Observability
13.1 Key Metrics (SLIs)
Tiles: tile.request.rate, tile.cache.hit_ratio, tile.origin.latency.p99
Routing: route.request.rate, route.latency.p50/p99, route.error.rate
Traffic: location.ingestion.rate, map_match.latency.p99, traffic.staleness.sec
Navigation: active.sessions, reroute.rate, reroute.latency.p99, eta.accuracy.p90
POI: search.request.rate, search.latency.p99, autocomplete.latency.p99
Kafka: consumer.lag (per topic/partition group), produce.error.rate
Flink: checkpoint.duration, checkpoint.failure.rate, backpressure.ratio
13.2 Critical Alerts
| Alert | Threshold | Severity |
|---|---|---|
| Route latency p99 > 200ms | Sustained for 5 min | P1 (page) |
| Tile CDN hit rate < 90% | Sustained for 10 min | P2 (warning) |
| Traffic staleness > 60s | Any region | P1 (page) |
| Kafka consumer lag > 1M messages | Any consumer group | P1 (page) |
| Re-route latency p99 > 500ms | Sustained for 5 min | P1 (page) |
| Map match confidence < 70% | Region-wide average | P2 (warning) |
| ETA accuracy > 15% error (p90) | Any region | P2 (warning) |
13.3 Distributed Tracing
Every route request and navigation session carries a trace ID propagated across services. Spans: Client -> CDN -> Tile Server, Client -> Routing Engine -> Graph Query -> Redis (traffic lookup). Sample at 0.1% for normal traffic, 100% for errors and p99 latency outliers. Trace storage in Grafana Tempo backed by S3.
14. Security
Transport: TLS 1.3 for all client connections (tile serving, routing API, navigation WebSocket). mTLS for internal service-to-service communication (routing server -> Redis, Flink -> Kafka).
Authentication: API keys for third-party integrations (rate-limited per key). OAuth 2.0 + JWT for authenticated user sessions (navigation, saved places, search history). JWT validated at the API gateway, short-lived tokens (15 minutes) with refresh.
Location data privacy: GPS data from navigators is processed for traffic aggregation only. Individual location traces are discarded within 24 hours. Aggregated segment speeds are anonymized through two layers: k-anonymity (a segment speed is only published if >= 5 distinct users contributed data in the 30-second window, k=5) and differential privacy (calibrated Laplace noise is added to aggregated speed values before publishing, providing formal mathematical guarantees that individual contributions are mathematically bounded from being inferred from the output). Navigation history stored on-device, not server-side, unless the user explicitly enables cloud sync.
Map data licensing: Road data from OpenStreetMap (ODbL license) and proprietary data providers requires attribution. Tile responses include attribution metadata. The tile rendering client displays required attributions.
Abuse prevention: Rate limiting on routing API (100 requests/sec per API key, prevent graph scraping), tile API (1,000 tiles/sec per IP, prevent bulk download), and search API (50 requests/sec per user). Geo-blocking for sanctioned regions per compliance requirements.
15. Building This at Startup Scale
The architecture above is designed for 1 billion DAU. A startup serving a single country (1-10M DAU) can use the same patterns with dramatically simpler infrastructure.
| Component | Google Scale | Startup Scale |
|---|---|---|
| Routing | CCH + 50 regional shards | OSRM (open-source CH) on a single 32GB server |
| Geospatial index | S2 Cells + custom sharding | PostGIS with GIST indexes |
| Tile serving | Custom pipeline + CloudFront | OpenMapTiles + Cloudflare CDN |
| Traffic ingestion | Kafka (50M/sec) + Flink | Redis Streams or Kafka (single broker) |
| Traffic aggregation | Flink cluster (5,000+ cores) | Simple cron job aggregating GPS traces every 60s |
| POI search | Elasticsearch (10+ nodes) | PostgreSQL full-text search + PostGIS |
| ETA | GNN model | Speed-limit baseline + historical averages |
| Map data | Satellite + probes + OSM pipeline | OSM data only (updated weekly via osmium) |
| Offline maps | Custom packages per region | MBTiles files served from S3 |
| Infrastructure cost | Millions/month | $2-10K/month |
Key simplifications:
- Skip Kafka entirely. At < 1M GPS updates/sec, write directly to Redis. A single Redis instance handles 100K writes/sec with pipelining; at higher volumes, batch writes or add a second instance.
- Use OSRM instead of building a custom routing engine. OSRM implements CH, handles 10K routes/sec on a single server, and is free.
- PostgreSQL does everything. POI search (full-text + PostGIS), tile metadata, traffic history, geocoding, all in one database until you hit 10M+ records per table.
- Batch traffic instead of streaming. Aggregate GPS traces every 60 seconds with a cron job instead of a Flink cluster. Latency goes from 30s to 90s, acceptable for most users.
- OpenMapTiles for tile generation. Open-source pipeline that converts OSM data to vector tiles. No satellite imagery pipeline needed.
- A single server can handle surprising scale. A 64GB server running OSRM + PostgreSQL + Redis + Nginx can serve 50K DAU (single-region, simplified feature set) with tile caching, routing, and basic traffic. Horizontal scaling is needed only when approaching 1M DAU.
The progression path: start with OSRM + PostgreSQL + Redis on a single server. Add Elasticsearch when POI search latency matters. Add Kafka + Flink when traffic freshness below 60 seconds matters. Add regional sharding when crossing 10M DAU. The architecture in this post is the destination; most teams should not start there.
Explore the Technologies
Core Technologies
| Technology | Role in This System | Learn More |
|---|---|---|
| Contraction Hierarchies | Sub-millisecond routing via CCH with real-time traffic weight updates | Contraction Hierarchies |
| Google S2 Geometry | Geospatial indexing for POI search, traffic aggregation, tile partitioning, data sharding | Google S2 |
| H3 | Demand heat maps and traffic visualization grids | H3 |
| PostGIS | POI storage with complex spatial queries, tile generation source data | PostGIS |
| Redis | Real-time traffic speed cache, location tracking, POI query cache, navigation session state | Redis |
| Apache Kafka | Real-time traffic ingestion pipeline, 50M location updates/sec (all sources) | Kafka |
| Apache Flink | Traffic speed aggregation, map matching, incident detection | Flink |
| Elasticsearch | POI text search with geospatial filtering and autocomplete | Elasticsearch |
| PostgreSQL | POI source of truth, tile metadata, geocoding database, traffic history | PostgreSQL |
| Maxar / Planet Labs | Satellite imagery for map data, building footprint detection, road discovery, change detection | External |
Infrastructure Patterns
| Pattern | Role in This System | Learn More |
|---|---|---|
| CDN and Edge Computing | 20B tile requests/day served from edge, 95%+ cache hit rate | CDN and Edge Computing |
| Message Queues and Event Streaming | Kafka topic design for location updates partitioned by S2 cell | Event Streaming |
| Caching Strategies | Multi-tier tile caching (CDN / regional Redis / origin), traffic speed cache | Caching Strategies |
| Database Sharding | Road graph regional sharding, POI index partitioning by S2 cell | Database Sharding |
| Load Balancer | Route request distribution across regional routing servers | Load Balancer |
| Kubernetes | Routing engine pods, Flink job deployment, tile server pods, auto-scaling | Kubernetes Architecture |
| Distributed Tracing | Request tracing across routing engine, traffic pipeline, tile serving | Distributed Tracing |
Further Reading
Routing & Geospatial:
- Contraction Hierarchies (Geisberger et al., 2008) — The original paper describing CH preprocessing and bidirectional query
- Customizable Contraction Hierarchies (Dibbelt et al., 2016) — Handling real-time weight updates without full reprocessing
- Customizable Route Planning (Delling et al., 2011) — Cell-based graph partitioning with arbitrary metric customization, used by Bing Maps
- RAPTOR: Round-Based Public Transit Routing (Delling et al., 2015) — Fast multi-modal transit routing
- Google S2 Geometry Library — Open-source spherical geometry with Hilbert curve spatial indexing
- H3 Hexagonal Indexing (Uber) — Hexagonal hierarchical geospatial indexing for analytics and visualization
- Hidden Markov Map Matching (Newson & Krumm, 2009) — The standard approach for GPS-to-road-segment matching
ETA & ML:
- ETA Prediction with Graph Neural Networks in Google Maps (Derrow-Pinion et al., 2021) — DeepMind's GNN model for traffic-aware ETA, published in Nature
- DeepETA (Uber, 2022) — Linear transformer architecture for ETA prediction with geospatial embeddings
Tile Serving & Data Formats:
- Mapbox Vector Tile Specification — The MVT encoding format for compact tile delivery
- GTFS Specification — Standard format for public transit schedules and real-time delays
- SpaceNet Challenge — Open datasets for road and building extraction from satellite imagery
Open-Source Tools (Section 15):
- OSRM (Project OSRM) — Open-source routing engine implementing CH, 10K routes/sec on a single server
- Valhalla Routing Engine (Mapbox) — Open-source routing engine using tiled road graph with hierarchical routing
- OpenMapTiles — Open-source vector tile generation from OpenStreetMap data
- osmium — Fast tool for working with OpenStreetMap data files
- MBTiles Specification — SQLite-based format for storing tilesets
Google Infrastructure Papers:
- Google Bigtable (Chang et al., 2006) — The distributed storage system underlying much of Google Maps data. Open-source lineage: HBase, then Cassandra, then ScyllaDB
- MillWheel (Akidau et al., 2013) — Google's stream processing that predated and inspired Apache Beam and Dataflow