Kong
Cloud-native API gateway built on NGINX
Use Cases
Architecture
Why It Exists
Once a monolith breaks into microservices, every single service needs the same boilerplate: authentication, rate limiting, logging, request transformation, circuit breaking. Copy-pasting that logic across 30 services is a recipe for inconsistency and a maintenance headache that only grows with each new service.
The API gateway pattern solves this by centralizing those cross-cutting concerns at a single entry point. Kong is one of the better implementations of that pattern. It sits on top of NGINX and OpenResty (NGINX + LuaJIT), so NGINX's proven performance comes for free, plus a plugin architecture that allows configuring all that cross-cutting behavior declaratively instead of coding it into each service.
Before Kong, the options were not great. The choices were building custom gateway logic on raw NGINX (brittle, painful to maintain) or buying a heavyweight enterprise gateway like Oracle API Gateway or IBM DataPower. Those were expensive and a terrible fit for cloud-native deployments. Kong filled a real gap here.
How It Works Internally
Kong is NGINX extended with OpenResty, which embeds LuaJIT into NGINX's event loop. When a request comes in, it flows through NGINX's standard processing phases, but Kong hooks into specific phases to run Lua-based plugin logic.
Here is the request lifecycle. NGINX accepts the connection and parses HTTP headers. Kong's router (a radix tree built from all configured routes) matches the request to a service + route pair based on host, path, methods, and headers. That match determines which plugins fire for this request. Kong then walks the plugin chain through NGINX phases in a defined order:
- certificate phase: Plugins can modify TLS handshake behavior (e.g., dynamic SSL cert selection per SNI).
- rewrite phase: Modify the request before routing (e.g., path transformation).
- access phase: This is where the heavy lifting happens. Authentication (JWT validation, OAuth token introspection, API key lookup) and authorization (ACL checks, rate limiting) all run here. If auth fails, Kong short-circuits and returns 401/403 without ever touching the upstream.
- header_filter phase: Modify response headers before sending to the client.
- body_filter phase: Modify the response body (e.g., XML to JSON transformation).
- log phase: Asynchronous logging to external systems (Datadog, Splunk, Kafka).
Plugin configuration lives in the control plane (PostgreSQL, Cassandra, or declarative YAML) and gets cached in shared memory on each data plane node. When someone updates a plugin via the admin API, the change propagates to all data plane nodes within seconds, either through database polling or hybrid mode push. Each data plane node keeps a local cache of the full configuration in LMDB, so config lookups are sub-millisecond and never hit the database on the hot path.
Worth calling out the rate-limiting plugin specifically, because people get this wrong constantly. It uses a sliding window counter algorithm. In local policy mode, counters live in NGINX shared memory. That is fast (no network I/O) but not coordinated across Kong nodes. In redis policy mode, counters live in Redis. That provides coordination across all nodes but adds 0.5-1ms of latency per rate-limit check. There is also a cluster policy mode that uses the Kong database, but it is deprecated for performance reasons. For accurate global rate limiting across multiple nodes, Redis is the only real option.
Production Architecture
There are three deployment modes, and the choice depends on operational maturity:
DB-less mode: Kong loads configuration from a declarative YAML file at startup. No database. Configuration changes mean regenerating the YAML and triggering a reload. This works extremely well for Kubernetes, where the Kong Ingress Controller watches for Ingress, KongPlugin, and KongConsumer CRDs, translates them into Kong declarative config, and pushes them to the data plane. For Kubernetes deployments, start here. Seriously.
Hybrid mode: The control plane (admin API + database) runs separately from data plane nodes. Data planes connect to the control plane over mTLS and get configuration updates via push. They cache the full config locally in LMDB, so they keep working if the control plane goes down (they just can not receive new config updates). This is the right call for multi-cluster or multi-region setups where data planes run close to users and the control plane lives in a central management cluster.
Traditional mode: Each Kong node connects directly to PostgreSQL/Cassandra for configuration. This was the original model and avoid it in production now. The database sits in the critical path for configuration reads (yes, config is cached, but still) and becomes a scaling bottleneck.
For high availability, put 3+ data plane nodes behind a cloud load balancer or DNS round-robin. Each node is stateless, pulling configuration from the control plane or declarative config, so any node can handle any request. Scale data planes horizontally based on request volume. Expect ~10,000-30,000 requests/sec per node depending on how many plugins are stacked.
Redis is a critical dependency in production. It backs cluster-wide rate limiting, response caching, and session storage. Run Redis as a 3-node Sentinel cluster or Redis Cluster for automatic failover. Pay attention to Redis latency, because it directly affects Kong's per-request latency whenever rate limiting or caching plugins are active.
Monitoring needs to cover three layers: NGINX-level metrics (connections, requests/sec, latency), Kong-level metrics (per-route, per-service, per-plugin latency), and upstream metrics (backend response time, error rates). The Prometheus plugin exposes everything at /metrics. Set alerts for: p99 latency exceeding the SLA (typically 50ms for gateway overhead), 5xx rate above 1% (gateway-generated errors), and rate-limit Redis connection errors.
Decision Criteria
| Criteria | Kong | AWS API Gateway | Envoy | Apigee | Traefik |
|---|---|---|---|---|---|
| Deployment | Self-hosted or Kong Konnect (SaaS) | Fully managed (AWS) | Self-hosted (often via Istio) | Google-managed SaaS | Self-hosted |
| Performance | ~10K-30K req/sec per node | ~10K req/sec (hard throttle) | ~20K-50K req/sec per node | Varies (managed) | ~10K-20K req/sec per node |
| Plugin/extension model | Lua, Go, Python, JS plugins | Lambda authorizers, VTL transforms | C++ filters, Lua, WASM | Java, JS policies | Middleware (Go) |
| Rate limiting | Built-in (local, Redis, cluster) | Built-in (per-stage, per-API) | External (rate limit service) | Built-in | Built-in (basic) |
| Auth support | JWT, OAuth2, OIDC, API Key, LDAP, mTLS | IAM, Cognito, Lambda authorizer | External auth service (ext_authz) | OAuth, SAML, API Key | BasicAuth, ForwardAuth |
| Service discovery | DNS, Consul, Kubernetes | AWS service integrations | EDS (Envoy Discovery Service) | Target servers | Kubernetes, Consul, Docker |
| Configuration | Admin API, declarative YAML, CRDs | AWS Console, CloudFormation, CDK | xDS API (control plane) | Edge UI, API | File, Docker labels, K8s CRDs |
| Cost | Free (OSS), Enterprise license | Pay per request ($3.50/million) | Free (OSS) | Subscription-based | Free (OSS), Enterprise |
| Best for | Multi-cloud API gateway, K8s ingress | AWS-native, serverless backends | Service mesh, L4/L7 proxy | Enterprise API management | Simple K8s ingress, Docker |
Capacity Planning
Data plane sizing: Each Kong data plane node handles ~10,000-30,000 requests/sec depending on plugin count and complexity. With 2 plugins (auth + rate limit): ~20,000 req/sec. With 6 plugins: ~10,000 req/sec. Each request adds ~1-3ms of gateway latency from plugin processing. Size data plane pods at 2-4 CPU cores and 2-4 GB RAM.
Plugin latency budget: JWT validation runs ~0.3ms. OAuth token introspection with a remote call: ~5-50ms (big range, depends on the IdP). Rate limit with local policy: ~0.1ms. Rate limit with Redis: ~0.5-1ms. Request transformation: ~0.2ms. Logging (async, batched): ~0.1ms. Add up all active plugins and that is the total chain latency. Budget 5ms total for the gateway layer. Go past that and the gateway itself becomes a bottleneck.
Redis sizing for rate limiting: Each rate-limit counter takes about 100 bytes. For 100,000 unique consumers with 10 rate-limit windows each, that requires ~100 MB. At 20,000 req/sec with Redis-backed rate limiting, Redis handles 20,000 GET+INCR operations/sec. A single Redis node handles that without breaking a sweat, but the network round-trips still add 0.5-1ms per request. For sub-millisecond rate limiting, use local policy and accept that limits are per-node.
Route table sizing: Kong's radix tree router handles 10,000+ routes with sub-millisecond matching. The catch: route rebuild time on configuration change is O(n log n). With 10,000 routes, a config reload takes ~500ms, and new connections may see slightly higher latency during that window. With more than 5,000 routes, consider splitting them across multiple data plane clusters.
PostgreSQL sizing (traditional/hybrid mode): The configuration database is tiny. Even a 1,000-route deployment with 50 plugins generates less than 100 MB of data. Each control plane node opens a connection pool (default 64 connections). In hybrid mode with 100 data plane nodes polling every 5 seconds, the control plane database handles ~20 queries/sec. That is nothing. Size PostgreSQL for availability (primary + replica), not capacity.
Failure Scenarios
Scenario 1: Redis Failure Causing Rate Limit Bypass
Trigger: The Redis instance backing cluster-wide rate limiting becomes unavailable. Maybe the primary fails and Sentinel has not promoted a replica yet, or a network partition isolates Redis from Kong nodes.
Impact: Kong's rate-limiting plugin is configured with policy=redis. When Redis is unreachable, behavior depends on the fault_tolerant setting (default: true). With fault_tolerant=true, Kong lets the request through without rate limiting, effectively disabling it for the entire outage. A malicious client or a misbehaving partner can now hammer the backends without any throttle. If the backends do not have their own protection, this cascades into a full backend outage. With fault_tolerant=false, Kong rejects all requests with 500 errors, which means the gateway is completely down.
Neither option is great. That is the core tradeoff in the design.
Detection: Monitor Redis connectivity from Kong nodes. Alert on kong_datastore_reachable dropping to 0. Track the kong_rate_limiting_plugin_errors counter. Watch backend request rates, because a sudden 10x spike means rate limiting has failed open.
Recovery: Run Redis with Sentinel (3-node minimum) or Redis Cluster for automatic failover. Set reasonable timeouts: connect_timeout at 1-2 seconds, send_timeout and read_timeout at 1 second, so Kong does not hang waiting on a dead Redis. For defense-in-depth, configure NGINX-level limit_req as a backstop that kicks in even if the Lua plugin layer fails. Consider using policy=local for non-critical rate limits and only using Redis-backed rate limiting where cross-node coordination is truly needed (e.g., billing-tier API limits).
Scenario 2: Plugin Configuration Error Causing Global Outage
Trigger: Someone pushes a configuration change via the Admin API that enables a plugin globally (all routes) with a broken config. Maybe a JWT plugin references a non-existent RSA public key, or a request-transformer plugin has a malformed Lua template.
Impact: Every single request on every route now hits the broken plugin. The JWT plugin returns 401 for everything because it cannot validate tokens without the key. The request-transformer throws a Lua error and Kong returns 500 everywhere. Because the config was applied globally, there are no unaffected routes. The entire API surface is down. And here is the painful part: data plane nodes cache the configuration, so restarting them does not fix anything. They just reload the bad config from cache.
Detection: Watch the global 5xx rate. Alert on kong_http_requests_total{code="500"} or code="401" spiking across all services at the same time. Track configuration change events through the Admin API audit log.
Recovery: Immediately revert the config change via the Admin API: delete the global plugin or update it with the correct configuration. For DB-less mode, roll back the declarative YAML to the previous version and trigger a reload. For hybrid mode, fix the control plane config and wait for data planes to pick up the change (default 5-second poll interval). Prevention is the real answer here. Set up a CI/CD pipeline for Kong configuration with a staging environment. Use deck diff (decK CLI) to preview changes before applying. Never apply plugins globally in production without testing on a single route first. Turn on Admin API RBAC to prevent unauthorized configuration changes.
Scenario 3: Control Plane Database Failure in Hybrid Mode
Trigger: The PostgreSQL database backing the Kong control plane goes down. Disk failure, primary crash with no automatic failover, or a botched schema migration corrupts the data.
Impact: The control plane can no longer push configuration updates. But here is the good news: Kong data plane nodes in hybrid mode cache the full configuration locally in LMDB. Existing data plane nodes keep proxying traffic with their last-known configuration, so the API gateway stays up for all existing routes and plugins. What is lost: the ability to make any configuration changes (new routes, updated plugins, new consumers), the Admin API returns errors, and new data plane nodes cannot bootstrap because they have no cached config to start with. If the outage drags on for hours, the team is flying blind operationally. There is no way to respond to incidents by updating routing or rate limits.
Detection: Monitor PostgreSQL replication lag and availability. Alert on Kong control plane health endpoint returning errors. Track the kong_data_plane_config_hash metric. If all data planes report the same config hash for an extended period, something is probably wrong with the control plane.
Recovery: Restore PostgreSQL from the latest backup or promote a standby replica. Kong's control plane reconnects automatically and resumes pushing configuration to data planes. Data planes detect the reconnection and verify their cached config against the control plane. For resilience: run PostgreSQL with streaming replication and automatic failover (Patroni, RDS Multi-AZ, or Cloud SQL HA). Store Kong configuration as code (decK YAML in Git) so the full configuration can be reconstructed from scratch if the database is unrecoverable. For emergency changes during a control plane outage, use kong config db_import on data plane nodes to load configuration directly from a declarative YAML file.
Pros
- • Built on NGINX, so you inherit its battle-tested performance
- • Rich plugin ecosystem (100+ plugins)
- • Supports declarative and database-backed configuration
- • Kubernetes-native with Ingress Controller
- • Open-source core with enterprise features
Cons
- • Adds latency compared to running NGINX directly
- • Plugin ecosystem quality is all over the place
- • Enterprise features require a paid license
- • Configuration gets messy at scale
- • Database dependency (Postgres/Cassandra) for some modes
When to use
- • You need a centralized API gateway with auth and rate limiting
- • You're managing multiple APIs/microservices behind one entry point
- • Kubernetes environments needing an ingress controller
- • You want plugin-based extensibility without writing custom code
When NOT to use
- • Simple reverse proxy needs (just use NGINX directly)
- • Ultra-low latency where every microsecond counts
- • Tight budget constraints (enterprise features are paid)
- • Simple static site serving
Key Points
- •Kong's data plane is OpenResty (NGINX + LuaJIT), and it runs plugin logic at specific NGINX processing phases. The plugin chain adds 1-3ms latency per request depending on plugin count and complexity.
- •DB-less mode with declarative YAML eliminates the database dependency. This makes Kong deployable as an immutable, GitOps-friendly sidecar or ingress, which matters a lot for Kubernetes-native setups.
- •The rate-limiting plugin uses a sliding window counter stored in shared memory (local) or Redis (cluster-wide). Local mode is faster but not coordinated across Kong nodes.
- •Kong's router uses a radix tree with O(log n) route matching, handling 10,000+ routes with sub-millisecond matching overhead. But route priority rules with complex path patterns can produce surprises.
- •Hybrid mode separates control plane (admin API, database) from data plane (proxy nodes), letting data planes run in untrusted networks and receive config via encrypted channels.
Common Mistakes
- ✗Running Kong in database mode (PostgreSQL) for Kubernetes deployments instead of DB-less or hybrid mode. The database becomes a single point of failure for config updates and adds operational overhead.
- ✗Stacking too many plugins (8+) on a single route without profiling. Each plugin adds Lua execution time; 10 plugins at 0.5ms each adds 5ms per request, and that compounds fast under high concurrency.
- ✗Using the local rate-limiting strategy in a multi-node deployment. Each Kong node keeps its own counter, so a 100 req/sec limit becomes 100 * N req/sec across N nodes.
- ✗Not setting upstream keepalive connections, which forces Kong to open a new TCP connection per request to backends (the same ephemeral port exhaustion problem that occurs with raw NGINX).
- ✗Deploying Kong Ingress Controller without resource limits, letting a misconfigured route or plugin eat unbounded memory and crash the pod.