Serverless & FaaS
Architecture Diagram
Why It Exists
Running servers is annoying. Plan capacity, patch operating systems, configure autoscaling, and keep things running around the clock, even when nobody is hitting the service. Serverless flips that on its head: the cloud provider owns all of the infrastructure, and the bill only reflects actual execution time.
For event-driven, bursty, or intermittent workloads, this is a great deal. A webhook handler that fires 100 times a day costs fractions of a cent on Lambda. The same thing running in an always-on container costs roughly $15/month. That gap matters with dozens of small services.
How It Works
Execution Model
When a function is invoked, the provider spins up an execution environment (a lightweight microVM or V8 isolate), loads the code, starts the runtime, and runs the handler. After execution, the provider freezes that environment and may reuse it for the next call. That is a warm start. If no warm environment is available, the result is a cold start, which is where the 100ms-10s latency hit comes from. Python and Node.js cold-start quickly. Java and C# with heavy dependency trees? Not so much.
Cold Start Mechanics
Cold start latency has a few components: microVM boot (~50ms on Firecracker), runtime initialization (~50-200ms), dependency loading (this varies wildly, a 50MB deployment package adds seconds), and application setup (database connections, config loading). Provisioned concurrency pre-warms N environments to skip all of that for latency-sensitive paths. The trade-off is paying for those warm environments sitting idle.
Concurrency Model
Each function instance handles exactly one request at a time (the Lambda model). Concurrent requests spin up concurrent instances, up to the account-level limit (default 1,000 on Lambda). So 1,000 simultaneous requests means 1,000 function instances.
Here is the part people miss: this concurrency model can crush the downstream database. A spike to 1,000 concurrent Lambda invocations opens 1,000 database connections, and most databases will choke on that. Use connection pooling (RDS Proxy, PgBouncer) or put a queue in front to buffer the load. I have seen this take down production systems more than once.
Cost Crossover Analysis
Serverless pricing follows a straightforward formula: invocations * duration * memory. At low volume, it beats everything. But the crossover point where containers become cheaper comes faster than most people expect.
| Metric | Serverless Cheaper | Containers Cheaper |
|---|---|---|
| Invocations/month | < 10M | > 10M |
| Avg duration | < 500ms | > 500ms |
| Steady-state concurrency | < 50 | > 50 |
| Utilization pattern | Bursty, unpredictable | Steady, predictable |
Here is a concrete example. At sustained 100 concurrent Lambda executions (1GB memory, 200ms avg), the monthly bill lands around $5,200. An equivalent Fargate deployment runs roughly $500. That is a 10x difference, and it catches teams off guard.
Architectural Patterns
- Fan-out: An S3 event triggers a Lambda for each uploaded file (image resize, video transcode). Inherently parallel.
- Step Functions: Orchestrate multi-step workflows with branching, retries, and error handling. The state machine model works well for approval workflows and ETL pipelines.
- Event-driven: EventBridge routes domain events to specific functions. Producers and consumers are fully decoupled.
Production Considerations
- Observability: Send structured logs to CloudWatch, use X-Ray for distributed tracing, and track custom metrics for business KPIs. Track cold starts as a separate metric. Skipping this means no visibility into why latency is spiking.
- Deployment packaging: Keep packages small. Use Lambda layers for shared dependencies. For larger applications, container image support (up to 10GB) gets around the 250MB zip limit.
- Error handling: Set up dead-letter queues (DLQ) for async invocations. Without a DLQ, failed events just disappear after retry exhaustion. Nobody will know they existed.
- VPC considerations: Putting Lambda in a VPC adds cold start latency because of ENI attachment. Only do it when private resource access is actually needed. Use VPC endpoints for AWS service calls.
- Idempotency: Functions may run more than once (at-least-once delivery). Build handlers to be idempotent using deduplication keys or conditional writes. This is not optional.
Failure Scenarios
Scenario 1: Downstream Database Connection Exhaustion. A traffic spike triggers 2,000 concurrent Lambda invocations, each opening a database connection. The RDS instance (default max 1,000 connections on db.r5.xlarge) starts rejecting new connections. Functions fail, retries make it worse, and the DLQ fills up. Meanwhile, non-serverless services sharing the same database also lose connectivity. Everything goes sideways at once. Detection: DatabaseConnections CloudWatch metric hits max, Lambda Errors rate spikes, RDS CPUUtilization may actually be low (this is connection exhaustion, not CPU). Recovery: immediately reduce Lambda reserved concurrency to throttle invocations. Long-term, deploy RDS Proxy (pools ~1,000 Lambda connections into ~50 database connections) or move to DynamoDB for serverless-native workloads.
Scenario 2: Cold Start Cascade After Deployment. A new Lambda version gets deployed, which invalidates all warm execution environments at once. The next traffic wave hits 100% cold starts. For Java functions running Spring Boot (8-15 second cold starts), P99 latency spikes to 15s, API Gateway returns 504 timeouts, and upstream clients retry, doubling the load. Detection: Init Duration in Lambda logs exceeds baseline by 10x, API Gateway 5xx error rate spikes. Recovery: use provisioned concurrency matching the expected traffic floor. Deploy with Lambda aliases and weighted routing (shift 10% of traffic first, wait, then push to 100%). Prevention: Cloudflare Workers sidestep this entirely since V8 isolates cold-start in under 5ms.
Scenario 3: Recursive Invocation Loop. A Lambda function triggered by an S3 event writes output back to the same bucket, re-triggering itself in an infinite loop. Invocations grow exponentially (1, 2, 4, 8...), hit the concurrency limit within minutes, and run up thousands of dollars in cost before anyone notices. Detection: ConcurrentExecutions climbs to account limit, Throttles metric rises, unexpected cost spike in Cost Explorer. Recovery: set reserved concurrency to 0 (this immediately stops all invocations), then fix the trigger filter. Prevention: always use distinct input and output buckets, set Lambda reserved concurrency limits, and deploy cost anomaly detection alerts. AWS documented this pattern after it burned multiple customers.
Capacity Planning
| Dimension | Threshold / Guideline | Real-World Reference |
|---|---|---|
| Concurrent executions | 1,000 default; request increase to 10K+ for production | iRobot: 30K concurrent Lambda executions for IoT fleet |
| Memory allocation | 128MB-10GB; CPU scales proportionally (1 vCPU at 1,769MB) | Optimize by profiling: 256MB often 2x faster than 128MB for same cost |
| Execution duration | 15 min max (Lambda); target < 30s for API workloads | Fender Digital: 60ms median Lambda duration for API tier |
| Deployment package | 50MB zipped / 250MB unzipped (or 10GB container image) | Keep < 10MB for Node.js functions to minimize cold start |
| Cost crossover | ~$15K/month Lambda spend, start evaluating containers | Capital One: migrated steady-state workloads to ECS at $50K/mo Lambda spend |
| Burst concurrency | 3,000 instant then +500/min (Lambda) | Design for the 500/min ramp. Pre-warm if bursts exceed 3K. |
Key formula: Monthly cost = invocations * 0.20/1M + GB-seconds * 0.0000166667. For 10M invocations/month at 256MB and 200ms average: $2.00 + (10M * 0.256GB * 0.2s * $0.0000166667) = $2.00 + $8.53 = $10.53/month. Compare that to Fargate: a single 0.25 vCPU / 0.5GB task running 24/7 costs about $9.50/month. The crossover shifts with utilization. Serverless wins below roughly 30% utilization. Containers win above that.
Architecture Decision Record
ADR: Serverless vs Containers vs VMs
Context: Picking the compute model for a new workload. Revisit this decision once traffic patterns become clear.
| Criteria (Weight) | Serverless (Lambda) | Containers (ECS/K8s) | VMs (EC2/GCE) |
|---|---|---|---|
| Time to production (20%) | Hours (write function, deploy) | Days (Dockerize, configure orchestrator) | Days-weeks (AMIs, ASGs, config mgmt) |
| Cost at low traffic (20%) | Near-zero (pay per invocation) | $15-50/mo minimum (always-on task) | $30-100/mo minimum (smallest instance) |
| Cost at high traffic (15%) | Expensive ($5K+/mo at sustained 100 concurrency) | Efficient ($500/mo equivalent workload) | Most efficient at > 70% utilization |
| Latency control (15%) | Limited (cold starts, no connection pooling) | Full (persistent processes, warm connections) | Full |
| Operational overhead (15%) | None (no infra management) | Medium (cluster ops, image pipeline) | High (OS patching, capacity planning) |
| Vendor lock-in (10%) | High (event source bindings are provider-specific) | Low (OCI standard, portable across clouds) | Low (standard OS, portable) |
| Team size required (5%) | 1-2 developers | 3-5 with platform engineer | 5+ with SRE/ops |
Decision guidance: Start with serverless for greenfield projects where the traffic shape is unknown. It removes both cost risk and operational burden early on. Move to containers when monthly Lambda spend crosses $5K, when P99 latency requirements drop below 50ms, or when persistent connections are needed (WebSockets, gRPC streaming). Use VMs for workloads that need specific kernel modules, GPU access, or compliance rules that prohibit shared tenancy. In practice, most mature organizations run all three models at the same time: serverless for event processing, containers for APIs, VMs for legacy workloads. That is fine. Pick the right tool for each job and do not try to force everything into one model.
Key Points
- •Run code without managing servers. The cloud provider handles provisioning, scaling, and patching.
- •Pay-per-invocation pricing means zero cost at zero traffic, but it gets expensive fast at sustained high throughput.
- •Cold start latency (100ms-10s) is the biggest trade-off. Provisioned concurrency helps, but costs money.
- •Great for event-driven, bursty, short-lived workloads. Poor fit for long-running processes.
- •Vendor lock-in is real. Lambda, Cloud Functions, and Azure Functions each have different APIs and limits.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| AWS Lambda | Managed | Broadest event source integration, mature ecosystem | Small-Enterprise |
| Cloudflare Workers | Managed | Edge execution, V8 isolates, sub-ms cold start | Small-Enterprise |
| Google Cloud Functions | Managed | GCP integration, Cloud Run for containers | Small-Enterprise |
| Knative | Open Source | Serverless on Kubernetes, no vendor lock-in | Medium-Enterprise |
Common Mistakes
- Using serverless for latency-sensitive synchronous APIs without provisioned concurrency
- Ignoring concurrent execution limits, then getting throttled when traffic spikes
- Building monolithic functions that do too much. Each function should do one thing.
- Forgetting cold start impact on P99 latency. That 1% of requests can be 10x slower than normal.
- Skipping timeout and memory limits. Runaway functions will burn through the budget.