H3
Uber's hexagonal spatial index that makes geospatial aggregation and visualization actually intuitive
Use Cases
Architecture
Why Uber Built This
Uber needed to answer questions like "how many drivers are available in this neighborhood right now?" and "what is the surge pricing for this area?" thousands of times per second. The natural approach is to divide the city into zones and aggregate supply/demand per zone. But which zones?
Zip codes are irregular and inconsistent across countries. Admin boundaries change. Arbitrary grids have edge effects where a boundary cuts through a natural neighborhood. And square grids have the adjacency problem: diagonal neighbors are further away than edge neighbors, which creates artifacts in smoothing and interpolation.
Uber built H3 specifically for this. Hexagonal grids provide uniform neighbor distances, consistent area per cell, and a clean hierarchical structure. When a user opens the Uber app, the backend computes the H3 hex at resolution 7 for the user's location, looks up the precomputed supply/demand ratio for that hex, and returns the surge multiplier. The whole lookup is a hash map access on a 64-bit key.
H3 was open-sourced in 2018 and has since been adopted by Foursquare, CARTO, Snowflake, Databricks, and others for geospatial analytics.
How the Grid Works
H3 starts with an icosahedron, a 20-sided polyhedron inscribed in a sphere. Each triangular face is then subdivided into hexagons. The subdivision uses a Class II aperture-7 scheme, which means each hex at resolution N contains approximately 7 hexes at resolution N+1.
Why aperture 7? With aperture 4 (each cell splits into 4), the result is squares, not hexagons. With aperture 3, the child hexes are rotated relative to the parent, which creates alignment issues. Aperture 7 is the smallest aperture that produces hexes aligned with their parents. The tradeoff is that the parent-child relationship is approximate, not exact. About 1/7 of each child hex area spills outside the parent. For aggregation this is a rounding error. For exact containment queries, it is a real limitation.
At resolution 0, there are 122 base cells (110 hexagons + 12 pentagons). Each resolution level multiplies the cell count by ~7:
| Resolution | Avg Hex Area | Avg Edge Length | Typical Use |
|---|---|---|---|
| 0 | ~4.3M km² | ~1,100 km | Global |
| 4 | ~1,770 km² | ~22 km | Regional |
| 7 | ~5.2 km² | ~1.2 km | City/neighborhood |
| 9 | ~105,000 m² | ~174 m | Block level |
| 11 | ~2,600 m² | ~27 m | Building level |
| 15 | ~0.9 m² | ~0.5 m | Sub-meter |
Analytics Patterns
The most common H3 pattern is aggregate and visualize. Convert each data point to an H3 index at the chosen resolution, group by index, and compute aggregates (count, average, sum, percentile). The result is a hex grid that renders directly on a map.
For demand forecasting, Uber computes features per H3 hex: historical ride requests, time of day, day of week, weather, events. A model predicts demand per hex for the next 15 minutes. The hex grid acts as the spatial unit for both feature computation and prediction output. Resolution 7 works well because it is small enough to capture local demand variation but large enough that each hex has sufficient historical data.
For coverage analysis, convert the service area boundary to a set of H3 hexes using polyfill. Convert asset locations (stores, warehouses, drivers) to hexes. For each hex, compute the minimum distance to the nearest asset. Hexes beyond a threshold are "uncovered." This produces a clean visualization of coverage gaps.
Flow analysis uses H3 pairs. For each trip or movement, record (origin_hex, destination_hex). Aggregating these pairs produces a flow matrix. The top-N flows between hexes can be visualized as lines on a map, identify corridors, and plan infrastructure. Uber uses this for driver positioning and route optimization.
H3 in Data Warehouses
H3 has native support in several modern data warehouses. Snowflake has built-in H3 functions: H3_LATLNG_TO_CELL, H3_CELL_TO_PARENT, H3_GRID_DISK. Databricks supports H3 through its Mosaic library. BigQuery has H3 functions in its geography toolkit. ClickHouse has geoToH3 and related functions.
This means spatial aggregation queries run directly in SQL:
SELECT h3_latlng_to_cell(lat, lng, 7) as hex,
COUNT(*) as order_count,
AVG(total_amount) as avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY hex
ORDER BY order_count DESC
No special spatial indexes, no geometry types, no PostGIS. Just a function call that converts lat/lng to a 64-bit integer, followed by a standard GROUP BY. This is why H3 has become so popular in analytics teams. It turns spatial questions into regular SQL.
Performance and Memory
H3 index computation from lat/lng takes about 200-400 nanoseconds, similar to S2. The k-ring function for k=1 (7 cells) takes about 500 nanoseconds. For k=5 (91 cells), it takes about 5 microseconds. These are fast enough to compute per-row in a streaming pipeline.
Memory per hex index is 8 bytes (64-bit integer). Pre-computing hex indexes for 1 billion points at 3 resolutions (for hierarchical drill-down) adds 24GB of storage. In practice, the hex is usually computed at query time or only one resolution is stored.
For polyfill (converting a polygon to hex set), a polygon covering a major city at resolution 9 produces roughly 5,000-20,000 hexes. Polyfill performance scales linearly with the polygon perimeter and quadratically with the number of hexes produced. A city-level polyfill at resolution 9 takes about 10-50 milliseconds.
When to Pick H3 vs S2
The short answer: H3 for analytics, S2 for indexing.
If the primary question is "aggregate data by geographic area for dashboards, ML features, or heat maps," H3 is the better choice. Hexagons look better on maps, the aperture-7 hierarchy is intuitive for drill-down, and the k-ring function makes neighbor-based operations simple.
If the primary question is "find all records near this point using a database index," S2 is the better choice. S2 cell IDs are ordered along a Hilbert curve, which makes range scans efficient. H3 IDs do not have this property.
Many systems use both. Uber uses H3 for analytics and demand forecasting, but uses S2-inspired approaches for the actual driver/rider matching that needs low-latency database lookups.
Pros
- • Hexagons tile a plane with uniform adjacency. Every hex has exactly 6 neighbors, same distance from center to center
- • Great for visualization and aggregation. Hex grids look clean on maps and avoid the visual bias of square grids
- • 16 resolution levels from continent-scale down to ~1 m² per hex
- • Hierarchical structure with predictable parent-child relationships
- • Bindings for Python, JavaScript, Java, Go, Rust, and more. Well-maintained.
Cons
- • Not a perfect tiling on a sphere. Uses 12 pentagons (unavoidable from topology) which can cause edge cases
- • Parent-child mapping is not exact. A parent hex does not perfectly contain its 7 children due to aperture-7 subdivision
- • Hex IDs are 64-bit but the Hilbert curve locality is weaker than S2's for range scan queries
- • Less suited for point-in-polygon or containment queries compared to S2
- • Community is strong but smaller than PostGIS or S2 ecosystems
When to use
- • You need to aggregate data by geographic area for analytics or visualization
- • Heat maps, demand forecasting, or coverage analysis are core features
- • Your team wants a simpler mental model than S2's quadtree cells
- • Neighbor traversal and adjacency queries matter (routing, flow analysis)
When NOT to use
- • Point-in-polygon containment checks (S2 or PostGIS are better)
- • You need range-scan-friendly IDs for database spatial indexing (S2 is stronger here)
- • Exact spatial containment where parent must perfectly contain children
- • Your workload is purely distance-based queries (find nearest K points)
Key Points
- •H3 uses an icosahedron (20-faced polyhedron) as its base projection, not a cube like S2. Each face is subdivided into hexagons using an aperture-7 scheme, meaning each parent hex contains roughly 7 children. This is not exact, which is a deliberate tradeoff for the benefits of hexagonal tiling.
- •Hexagons have a unique property among regular polygons: uniform adjacency. Every hex has exactly 6 neighbors, and the distance from any hex center to its neighbor's center is the same. Squares have 4 edge-neighbors and 4 corner-neighbors at different distances. Triangles are even worse. This uniformity makes aggregation, smoothing, and neighbor traversal much cleaner.
- •Resolution 7 is the sweet spot for city-level analytics. Each hex is about 5.2 km², roughly the size of a neighborhood. Resolution 9 (~105,000 m²) works for block-level analysis. Resolution 11 (~2,600 m²) covers building-level. There are 16 resolutions total (0-15), and area decreases by a factor of ~7 at each step.
- •The k-ring function returns all hexagons within k steps of a center hex. k_ring(center, 1) returns 7 hexes (center + 6 neighbors). k_ring(center, 2) returns 19 hexes. This is the workhorse for proximity queries in H3. It is O(k²) in the number of returned hexes, which is usually fine because k is small.
- •H3 indexes are 64-bit integers, but unlike S2 cell IDs, they are not ordered along a space-filling curve that maximizes locality. Adjacent hexes can have very different numeric IDs. This means H3 is not as well-suited for database range scans as S2. It excels instead at in-memory operations: aggregation, visualization, and analytics.
- •The 12 pentagons are a mathematical necessity. It is impossible to tile a sphere with only hexagons (Euler's formula makes this unavoidable). H3 handles this with 12 pentagonal cells at each resolution, placed at the vertices of the icosahedron. In practice, these pentagons are in the ocean or in sparsely populated areas, and most applications never encounter them.
Common Mistakes
- ✗Using H3 for database spatial indexing like S2. H3 indexes are not ordered for range scans. For 'find all points within 5 km' as a database query, S2's cell IDs with B-tree range scans will outperform scanning H3 hex IDs. H3 works best when data is loaded into memory and aggregated by hex.
- ✗Ignoring the aperture-7 imperfection. A parent hex does not perfectly contain its 7 children. Some child hex area spills outside the parent boundary. For exact containment (all points in child are guaranteed to be in parent), this becomes a real problem. For aggregation and visualization, it is usually fine.
- ✗Picking resolution without thinking about data density. Resolution 9 creates ~4.8 million hexes covering Earth's land surface. If the dataset has 10,000 points spread across a country, most hexes will be empty and the aggregation is meaningless. Choose a resolution where each hex has a statistically meaningful number of data points.
- ✗Treating k-ring as a distance function. k_ring(center, 2) returns all hexes within 2 edge-traversals, but the actual geographic distance varies by latitude and resolution. Two hexes that are k=2 apart near the equator cover more distance than two that are k=2 apart near the poles. For precise distance, compute geodesic distance between hex centers.
- ✗Not using compact representations for large regions. To represent 'all of Manhattan,' storing every resolution-9 hex individually wastes space. H3's compact function replaces groups of 7 children with their parent hex, reducing the cell count significantly. Always compact before storing or transmitting large hex sets.