gRPC
Google's RPC framework that actually delivers on the 'high performance' promise, built on HTTP/2 and Protocol Buffers
Use Cases
Architecture
Why It Exists
REST with JSON won because it is simple, human-readable, and works everywhere. Fair enough. But once a system has dozens of services calling each other thousands of times per second, the cracks show fast. JSON parsing is 10-100x slower than binary deserialization. HTTP/1.1 needs a connection per request (or painful connection pooling). There is no real standard for streaming. And API contracts are informal at best. An OpenAPI spec is documentation, not enforcement.
gRPC came out of Google in 2015, and it is basically the open-source version of Stubby, the internal RPC system Google had been running for over a decade. HTTP/2 provides multiplexed connections and native streaming. Protocol Buffers deliver fast binary serialization with strongly typed, versioned schemas. Code generation produces client and server stubs in whatever language is needed, so nobody is hand-rolling API clients. The net result: 10-100x more efficient than REST/JSON for internal communication. That is not a marketing number. Profile it and the difference is obvious.
How It Works
Protocol Buffers (.proto files): Service interfaces are defined in .proto files that specify message types (request/response schemas) and service methods. Run the protoc compiler and it spits out client stubs and server interfaces in the target language. A Go service and a Python service communicate perfectly because both are generated from the same .proto definition. Schemas are versioned using field numbers. Each field gets a unique number that identifies it in the binary format, so fields can be added or deprecated without breaking anything.
HTTP/2 Transport: gRPC runs on HTTP/2. Each RPC call maps to an HTTP/2 stream within a single TCP connection. Multiple concurrent RPCs multiplex over that same connection without head-of-line blocking (unlike HTTP/1.1 pipelining, which never really worked). HPACK header compression reduces per-request overhead. The binary framing layer handles message delimiting within streams.
Four RPC Patterns: Unary is the standard request-response, like REST. Server Streaming is one request, a stream of responses back (think price feeds, log tailing). Client Streaming sends a stream of requests and gets one response (uploading chunks, aggregating sensor readings). Bidirectional Streaming lets both sides stream at the same time (chat, gaming, real-time collaboration). Most teams start with unary and add streaming later when the use case demands it.
Architecture Deep Dive
Serialization Performance: Protocol Buffers use tag-length-value (TLV) binary encoding. Field numbers (tags) are encoded as varints (1-5 bytes), lengths are varints, and values go in their native binary representation. An int64 takes 8 bytes, not the 10 ASCII characters of "1234567890". A message with a 64-bit integer, a 32-bit float, and a 100-character string runs about 120 bytes in Protobuf versus 200+ in JSON. Deserialization is pointer arithmetic: read the tag, read the length, advance the pointer. That is why it is 10-100x faster than JSON parsing.
Connection Management: A gRPC Channel represents a logical connection to a server. It handles DNS resolution, connection establishment, TLS handshake, and reconnection. Under the hood, a channel can maintain multiple HTTP/2 connections (subchannels) for load balancing. The channel's load balancing policy (pick_first, round_robin, or something custom) picks which subchannel handles each call. For client-side load balancing, the client resolves the service name to multiple addresses and spreads calls across them. This works well but adds operational complexity. Understanding the connection level is essential, or mysterious imbalances will take hours to debug.
Interceptors and Middleware: gRPC interceptors wrap every RPC call. In Go and Java they are called interceptors; other languages call them middleware. Unary interceptors handle single request-response calls. Stream interceptors handle streaming calls. The common ones: authentication (validate JWT tokens), logging (method, duration, status), metrics (Prometheus histogram of RPC durations), tracing (inject/extract OpenTelemetry span context), and retry (automatically retry on retriable status codes). Keep the interceptor chain short. Each one adds latency, and debugging a ten-deep interceptor chain is nobody's idea of fun.
gRPC-Web: Browsers cannot make native HTTP/2 requests with gRPC's binary framing. gRPC-Web works around this with a proxy (usually Envoy) that translates between the browser's HTTP/1.1 or fetch requests and the backend's HTTP/2 gRPC protocol. The client uses a JavaScript or TypeScript library that serializes Protobuf messages and talks to the proxy. This lets web apps call gRPC services, but there is a real limitation: bidirectional streaming does not work. Only unary and server streaming are available. Full bidirectional from a browser requires WebSockets.
Netflix uses gRPC for inter-service communication across their microservices platform. It handles the scale they need in a way REST could not. But be realistic: most teams are not Netflix. The strongest argument for gRPC is not that big companies use it. It is that typed contracts and code generation eliminate an entire class of integration bugs that REST teams fight constantly.
Production Best Practices
Use gRPC's standard health check protocol (grpc.health.v1.Health). Set deadlines on every single RPC call. An RPC without a deadline can hang forever if the server stalls, and that will cascade through the whole system. Configure keepalive pings to catch dead connections early (set GRPC_KEEPALIVE_TIME_MS to 60 seconds). Monitor RPC error rates by method and status code. UNAVAILABLE means the server is unreachable. DEADLINE_EXCEEDED means it is too slow. RESOURCE_EXHAUSTED means the service is being rate limited. Each one points to a different fix, so do not lump them together in a single "errors" dashboard.
Pros
- • 10-100x faster serialization than JSON thanks to Protocol Buffers binary format
- • HTTP/2 multiplexing kills head-of-line blocking and opens the door to streaming
- • Strongly typed contracts with code generation for 12+ languages
- • Four communication patterns: unary, server streaming, client streaming, bidirectional
- • Built-in deadline propagation, cancellation, and metadata passing
Cons
- • Binary format is not human-readable. You need tooling just to inspect payloads.
- • Browser support requires a gRPC-Web proxy since browsers don't expose raw HTTP/2
- • Load balancing gets tricky. HTTP/2 connection reuse means L4 balancers won't cut it.
- • Proto schema evolution demands real discipline around backward compatibility
- • Smaller ecosystem of tooling and middleware compared to REST/JSON
When to use
- • Internal service-to-service calls where latency is on the critical path
- • You need streaming: real-time updates, log tailing, event feeds
- • Polyglot microservices that benefit from generated, typed client/server code
- • High-throughput APIs where JSON serialization shows up in your flame graphs
When NOT to use
- • Public APIs consumed by web browsers (just use REST or GraphQL)
- • Simple CRUD services where shipping fast matters more than shaving milliseconds
- • Teams that have never touched Protocol Buffers or schema management
- • Environments where HTTP/2 is blocked or unsupported by network intermediaries
Key Points
- •HTTP/2 multiplexing sends multiple gRPC calls over a single TCP connection. This eliminates the connection-per-request overhead of HTTP/1.1 and cuts latency by skipping repeated TLS handshakes.
- •Protocol Buffers use a binary wire format with field numbers instead of field names. A message with 10 fields takes 50-100 bytes in Protobuf vs. 500-1000 bytes in JSON. Deserialization is basically pointer arithmetic instead of JSON string parsing.
- •Deadline propagation chains timeouts across services automatically. If Service A sets a 5-second deadline calling Service B, and B calls C after 2 seconds, C gets a 3-second deadline. This stops cascading slow requests from eating all available resources.
- •Server reflection lets clients discover available services and methods at runtime without needing the .proto files. This is essential for debugging tools like grpcurl and for building generic gRPC proxies.
- •gRPC interceptors (middleware) run on every call for cross-cutting concerns. Authentication, logging, metrics, tracing, rate limiting: all interceptors, all outside the business logic.
Common Mistakes
- ✗Using a single gRPC channel for all calls. Yes, HTTP/2 multiplexes, but one channel means one TCP connection. For high-throughput services, create a channel pool (4-8 channels) to actually use multiple TCP connections and avoid transport-layer head-of-line blocking.
- ✗Not handling backpressure in streaming RPCs. A fast server streaming to a slow client fills the HTTP/2 flow control window. Without explicit flow control, the server buffers without limit and eventually OOMs.
- ✗Breaking backward compatibility in proto schemas. Removing fields, renaming them, changing field numbers, or switching field types will break existing clients. Deprecate fields and add new ones instead. Always.
- ✗Ignoring gRPC-specific load balancing. Unlike HTTP/1.1 where each request opens a fresh connection, gRPC reuses connections. L4 load balancers distribute connections, not requests. Use L7 balancers (Envoy) or client-side load balancing for even request distribution.
- ✗Not setting max message size limits. The default is 4MB. A service returning a large payload silently truncates the response. Set limits based on the use case, but know that very large messages (100MB+) defeat the whole point of streaming.