gRPC

Why It Exists

REST with JSON won because it is simple, human-readable, and works everywhere. Fair enough. But once a system has dozens of services calling each other thousands of times per second, the cracks show fast. JSON parsing is 10-100x slower than binary deserialization. HTTP/1.1 needs a connection per request (or painful connection pooling). There is no real standard for streaming. And API contracts are informal at best. An OpenAPI spec is documentation, not enforcement.

gRPC came out of Google in 2015, and it is basically the open-source version of Stubby, the internal RPC system Google had been running for over a decade. HTTP/2 provides multiplexed connections and native streaming. Protocol Buffers deliver fast binary serialization with strongly typed, versioned schemas. Code generation produces client and server stubs in whatever language is needed, so nobody is hand-rolling API clients. The net result: 10-100x more efficient than REST/JSON for internal communication. That is not a marketing number. Profile it and the difference is obvious.

How It Works

Protocol Buffers (.proto files): Service interfaces are defined in .proto files that specify message types (request/response schemas) and service methods. Run the protoc compiler and it spits out client stubs and server interfaces in the target language. A Go service and a Python service communicate perfectly because both are generated from the same .proto definition. Schemas are versioned using field numbers. Each field gets a unique number that identifies it in the binary format, so fields can be added or deprecated without breaking anything.

HTTP/2 Transport: gRPC runs on HTTP/2. Each RPC call maps to an HTTP/2 stream within a single TCP connection. Multiple concurrent RPCs multiplex over that same connection without head-of-line blocking (unlike HTTP/1.1 pipelining, which never really worked). HPACK header compression reduces per-request overhead. The binary framing layer handles message delimiting within streams.

Four RPC Patterns: Unary is the standard request-response, like REST. Server Streaming is one request, a stream of responses back (think price feeds, log tailing). Client Streaming sends a stream of requests and gets one response (uploading chunks, aggregating sensor readings). Bidirectional Streaming lets both sides stream at the same time (chat, gaming, real-time collaboration). Most teams start with unary and add streaming later when the use case demands it.

Architecture Deep Dive

Serialization Performance: Protocol Buffers use tag-length-value (TLV) binary encoding. Field numbers (tags) are encoded as varints (1-5 bytes), lengths are varints, and values go in their native binary representation. An int64 takes 8 bytes, not the 10 ASCII characters of "1234567890". A message with a 64-bit integer, a 32-bit float, and a 100-character string runs about 120 bytes in Protobuf versus 200+ in JSON. Deserialization is pointer arithmetic: read the tag, read the length, advance the pointer. That is why it is 10-100x faster than JSON parsing.

Connection Management: A gRPC Channel represents a logical connection to a server. It handles DNS resolution, connection establishment, TLS handshake, and reconnection. Under the hood, a channel can maintain multiple HTTP/2 connections (subchannels) for load balancing. The channel's load balancing policy (pick_first, round_robin, or something custom) picks which subchannel handles each call. For client-side load balancing, the client resolves the service name to multiple addresses and spreads calls across them. This works well but adds operational complexity. Understanding the connection level is essential, or mysterious imbalances will take hours to debug.

Interceptors and Middleware: gRPC interceptors wrap every RPC call. In Go and Java they are called interceptors; other languages call them middleware. Unary interceptors handle single request-response calls. Stream interceptors handle streaming calls. The common ones: authentication (validate JWT tokens), logging (method, duration, status), metrics (Prometheus histogram of RPC durations), tracing (inject/extract OpenTelemetry span context), and retry (automatically retry on retriable status codes). Keep the interceptor chain short. Each one adds latency, and debugging a ten-deep interceptor chain is nobody's idea of fun.

gRPC-Web: Browsers cannot make native HTTP/2 requests with gRPC's binary framing. gRPC-Web works around this with a proxy (usually Envoy) that translates between the browser's HTTP/1.1 or fetch requests and the backend's HTTP/2 gRPC protocol. The client uses a JavaScript or TypeScript library that serializes Protobuf messages and talks to the proxy. This lets web apps call gRPC services, but there is a real limitation: bidirectional streaming does not work. Only unary and server streaming are available. Full bidirectional from a browser requires WebSockets.

Netflix uses gRPC for inter-service communication across their microservices platform. It handles the scale they need in a way REST could not. But be realistic: most teams are not Netflix. The strongest argument for gRPC is not that big companies use it. It is that typed contracts and code generation eliminate an entire class of integration bugs that REST teams fight constantly.

Production Best Practices

Use gRPC's standard health check protocol (grpc.health.v1.Health). Set deadlines on every single RPC call. An RPC without a deadline can hang forever if the server stalls, and that will cascade through the whole system. Configure keepalive pings to catch dead connections early (set GRPC_KEEPALIVE_TIME_MS to 60 seconds). Monitor RPC error rates by method and status code. UNAVAILABLE means the server is unreachable. DEADLINE_EXCEEDED means it is too slow. RESOURCE_EXHAUSTED means the service is being rate limited. Each one points to a different fix, so do not lump them together in a single "errors" dashboard.

Why It Exists

How It Works

Architecture Deep Dive

Production Best Practices

Use Cases

Architecture

Why It Exists

How It Works

Architecture Deep Dive

Production Best Practices

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies

gRPC

Use Cases

Architecture

Why It Exists

How It Works

Architecture Deep Dive

Production Best Practices

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies