Serverless & FaaS

Why It Exists

Running servers is annoying. Plan capacity, patch operating systems, configure autoscaling, and keep things running around the clock, even when nobody is hitting the service. Serverless flips that on its head: the cloud provider owns all of the infrastructure, and the bill only reflects actual execution time.

For event-driven, bursty, or intermittent workloads, this is a great deal. A webhook handler that fires 100 times a day costs fractions of a cent on Lambda. The same thing running in an always-on container costs roughly $15/month. That gap matters with dozens of small services.

How It Works

Execution Model

When a function is invoked, the provider spins up an execution environment (a lightweight microVM or V8 isolate), loads the code, starts the runtime, and runs the handler. After execution, the provider freezes that environment and may reuse it for the next call. That is a warm start. If no warm environment is available, the result is a cold start, which is where the 100ms-10s latency hit comes from. Python and Node.js cold-start quickly. Java and C# with heavy dependency trees? Not so much.

Cold Start Mechanics

Cold start latency has a few components: microVM boot (~50ms on Firecracker), runtime initialization (~50-200ms), dependency loading (this varies wildly, a 50MB deployment package adds seconds), and application setup (database connections, config loading). Provisioned concurrency pre-warms N environments to skip all of that for latency-sensitive paths. The trade-off is paying for those warm environments sitting idle.

Concurrency Model

Each function instance handles exactly one request at a time (the Lambda model). Concurrent requests spin up concurrent instances, up to the account-level limit (default 1,000 on Lambda). So 1,000 simultaneous requests means 1,000 function instances.

Here is the part people miss: this concurrency model can crush the downstream database. A spike to 1,000 concurrent Lambda invocations opens 1,000 database connections, and most databases will choke on that. Use connection pooling (RDS Proxy, PgBouncer) or put a queue in front to buffer the load. I have seen this take down production systems more than once.

Cost Crossover Analysis

Serverless pricing follows a straightforward formula: invocations * duration * memory. At low volume, it beats everything. But the crossover point where containers become cheaper comes faster than most people expect.

Metric	Serverless Cheaper	Containers Cheaper
Invocations/month	< 10M	> 10M
Avg duration	< 500ms	> 500ms
Steady-state concurrency	< 50	> 50
Utilization pattern	Bursty, unpredictable	Steady, predictable

Here is a concrete example. At sustained 100 concurrent Lambda executions (1GB memory, 200ms avg), the monthly bill lands around $5,200. An equivalent Fargate deployment runs roughly $500. That is a 10x difference, and it catches teams off guard.

Architectural Patterns

Fan-out: An S3 event triggers a Lambda for each uploaded file (image resize, video transcode). Inherently parallel.
Step Functions: Orchestrate multi-step workflows with branching, retries, and error handling. The state machine model works well for approval workflows and ETL pipelines.
Event-driven: EventBridge routes domain events to specific functions. Producers and consumers are fully decoupled.

Production Considerations

Observability: Send structured logs to CloudWatch, use X-Ray for distributed tracing, and track custom metrics for business KPIs. Track cold starts as a separate metric. Skipping this means no visibility into why latency is spiking.
Deployment packaging: Keep packages small. Use Lambda layers for shared dependencies. For larger applications, container image support (up to 10GB) gets around the 250MB zip limit.
Error handling: Set up dead-letter queues (DLQ) for async invocations. Without a DLQ, failed events just disappear after retry exhaustion. Nobody will know they existed.
VPC considerations: Putting Lambda in a VPC adds cold start latency because of ENI attachment. Only do it when private resource access is actually needed. Use VPC endpoints for AWS service calls.
Idempotency: Functions may run more than once (at-least-once delivery). Build handlers to be idempotent using deduplication keys or conditional writes. This is not optional.

Failure Scenarios

Scenario 1: Downstream Database Connection Exhaustion. A traffic spike triggers 2,000 concurrent Lambda invocations, each opening a database connection. The RDS instance (default max 1,000 connections on db.r5.xlarge) starts rejecting new connections. Functions fail, retries make it worse, and the DLQ fills up. Meanwhile, non-serverless services sharing the same database also lose connectivity. Everything goes sideways at once. Detection: DatabaseConnections CloudWatch metric hits max, Lambda Errors rate spikes, RDS CPUUtilization may actually be low (this is connection exhaustion, not CPU). Recovery: immediately reduce Lambda reserved concurrency to throttle invocations. Long-term, deploy RDS Proxy (pools ~1,000 Lambda connections into ~50 database connections) or move to DynamoDB for serverless-native workloads.

Scenario 2: Cold Start Cascade After Deployment. A new Lambda version gets deployed, which invalidates all warm execution environments at once. The next traffic wave hits 100% cold starts. For Java functions running Spring Boot (8-15 second cold starts), P99 latency spikes to 15s, API Gateway returns 504 timeouts, and upstream clients retry, doubling the load. Detection: Init Duration in Lambda logs exceeds baseline by 10x, API Gateway 5xx error rate spikes. Recovery: use provisioned concurrency matching the expected traffic floor. Deploy with Lambda aliases and weighted routing (shift 10% of traffic first, wait, then push to 100%). Prevention: Cloudflare Workers sidestep this entirely since V8 isolates cold-start in under 5ms.

Scenario 3: Recursive Invocation Loop. A Lambda function triggered by an S3 event writes output back to the same bucket, re-triggering itself in an infinite loop. Invocations grow exponentially (1, 2, 4, 8...), hit the concurrency limit within minutes, and run up thousands of dollars in cost before anyone notices. Detection: ConcurrentExecutions climbs to account limit, Throttles metric rises, unexpected cost spike in Cost Explorer. Recovery: set reserved concurrency to 0 (this immediately stops all invocations), then fix the trigger filter. Prevention: always use distinct input and output buckets, set Lambda reserved concurrency limits, and deploy cost anomaly detection alerts. AWS documented this pattern after it burned multiple customers.

Capacity Planning

Dimension	Threshold / Guideline	Real-World Reference
Concurrent executions	1,000 default; request increase to 10K+ for production	iRobot: 30K concurrent Lambda executions for IoT fleet
Memory allocation	128MB-10GB; CPU scales proportionally (1 vCPU at 1,769MB)	Optimize by profiling: 256MB often 2x faster than 128MB for same cost
Execution duration	15 min max (Lambda); target < 30s for API workloads	Fender Digital: 60ms median Lambda duration for API tier
Deployment package	50MB zipped / 250MB unzipped (or 10GB container image)	Keep < 10MB for Node.js functions to minimize cold start
Cost crossover	~$15K/month Lambda spend, start evaluating containers	Capital One: migrated steady-state workloads to ECS at $50K/mo Lambda spend
Burst concurrency	3,000 instant then +500/min (Lambda)	Design for the 500/min ramp. Pre-warm if bursts exceed 3K.

Key formula: Monthly cost = invocations * 0.20/1M + GB-seconds * 0.0000166667. For 10M invocations/month at 256MB and 200ms average: $2.00 + (10M * 0.256GB * 0.2s * $0.0000166667) = $2.00 + $8.53 = $10.53/month. Compare that to Fargate: a single 0.25 vCPU / 0.5GB task running 24/7 costs about $9.50/month. The crossover shifts with utilization. Serverless wins below roughly 30% utilization. Containers win above that.

Architecture Decision Record

ADR: Serverless vs Containers vs VMs

Context: Picking the compute model for a new workload. Revisit this decision once traffic patterns become clear.

Criteria (Weight)	Serverless (Lambda)	Containers (ECS/K8s)	VMs (EC2/GCE)
Time to production (20%)	Hours (write function, deploy)	Days (Dockerize, configure orchestrator)	Days-weeks (AMIs, ASGs, config mgmt)
Cost at low traffic (20%)	Near-zero (pay per invocation)	$15-50/mo minimum (always-on task)	$30-100/mo minimum (smallest instance)
Cost at high traffic (15%)	Expensive ($5K+/mo at sustained 100 concurrency)	Efficient ($500/mo equivalent workload)	Most efficient at > 70% utilization
Latency control (15%)	Limited (cold starts, no connection pooling)	Full (persistent processes, warm connections)	Full
Operational overhead (15%)	None (no infra management)	Medium (cluster ops, image pipeline)	High (OS patching, capacity planning)
Vendor lock-in (10%)	High (event source bindings are provider-specific)	Low (OCI standard, portable across clouds)	Low (standard OS, portable)
Team size required (5%)	1-2 developers	3-5 with platform engineer	5+ with SRE/ops

Decision guidance: Start with serverless for greenfield projects where the traffic shape is unknown. It removes both cost risk and operational burden early on. Move to containers when monthly Lambda spend crosses $5K, when P99 latency requirements drop below 50ms, or when persistent connections are needed (WebSockets, gRPC streaming). Use VMs for workloads that need specific kernel modules, GPU access, or compliance rules that prohibit shared tenancy. In practice, most mature organizations run all three models at the same time: serverless for event processing, containers for APIs, VMs for legacy workloads. That is fine. Pick the right tool for each job and do not try to force everything into one model.

Tool	Type	Best For	Scale
AWS Lambda	Managed	Broadest event source integration, mature ecosystem	Small-Enterprise
Cloudflare Workers	Managed	Edge execution, V8 isolates, sub-ms cold start	Small-Enterprise
Google Cloud Functions	Managed	GCP integration, Cloud Run for containers	Small-Enterprise
Knative	Open Source	Serverless on Kubernetes, no vendor lock-in	Medium-Enterprise

Why It Exists

How It Works

Execution Model

Cold Start Mechanics

Concurrency Model

Cost Crossover Analysis

Metric	Serverless Cheaper	Containers Cheaper
Invocations/month	< 10M	> 10M
Avg duration	< 500ms	> 500ms
Steady-state concurrency	< 50	> 50
Utilization pattern	Bursty, unpredictable	Steady, predictable

Architectural Patterns

Fan-out: An S3 event triggers a Lambda for each uploaded file (image resize, video transcode). Inherently parallel.
Step Functions: Orchestrate multi-step workflows with branching, retries, and error handling. The state machine model works well for approval workflows and ETL pipelines.
Event-driven: EventBridge routes domain events to specific functions. Producers and consumers are fully decoupled.

Production Considerations

Observability: Send structured logs to CloudWatch, use X-Ray for distributed tracing, and track custom metrics for business KPIs. Track cold starts as a separate metric. Skipping this means no visibility into why latency is spiking.
Deployment packaging: Keep packages small. Use Lambda layers for shared dependencies. For larger applications, container image support (up to 10GB) gets around the 250MB zip limit.
Error handling: Set up dead-letter queues (DLQ) for async invocations. Without a DLQ, failed events just disappear after retry exhaustion. Nobody will know they existed.
VPC considerations: Putting Lambda in a VPC adds cold start latency because of ENI attachment. Only do it when private resource access is actually needed. Use VPC endpoints for AWS service calls.
Idempotency: Functions may run more than once (at-least-once delivery). Build handlers to be idempotent using deduplication keys or conditional writes. This is not optional.

Failure Scenarios

Capacity Planning

Dimension	Threshold / Guideline	Real-World Reference
Concurrent executions	1,000 default; request increase to 10K+ for production	iRobot: 30K concurrent Lambda executions for IoT fleet
Memory allocation	128MB-10GB; CPU scales proportionally (1 vCPU at 1,769MB)	Optimize by profiling: 256MB often 2x faster than 128MB for same cost
Execution duration	15 min max (Lambda); target < 30s for API workloads	Fender Digital: 60ms median Lambda duration for API tier
Deployment package	50MB zipped / 250MB unzipped (or 10GB container image)	Keep < 10MB for Node.js functions to minimize cold start
Cost crossover	~$15K/month Lambda spend, start evaluating containers	Capital One: migrated steady-state workloads to ECS at $50K/mo Lambda spend
Burst concurrency	3,000 instant then +500/min (Lambda)	Design for the 500/min ramp. Pre-warm if bursts exceed 3K.

Architecture Decision Record

ADR: Serverless vs Containers vs VMs

Context: Picking the compute model for a new workload. Revisit this decision once traffic patterns become clear.

Criteria (Weight)	Serverless (Lambda)	Containers (ECS/K8s)	VMs (EC2/GCE)
Time to production (20%)	Hours (write function, deploy)	Days (Dockerize, configure orchestrator)	Days-weeks (AMIs, ASGs, config mgmt)
Cost at low traffic (20%)	Near-zero (pay per invocation)	$15-50/mo minimum (always-on task)	$30-100/mo minimum (smallest instance)
Cost at high traffic (15%)	Expensive ($5K+/mo at sustained 100 concurrency)	Efficient ($500/mo equivalent workload)	Most efficient at > 70% utilization
Latency control (15%)	Limited (cold starts, no connection pooling)	Full (persistent processes, warm connections)	Full
Operational overhead (15%)	None (no infra management)	Medium (cluster ops, image pipeline)	High (OS patching, capacity planning)
Vendor lock-in (10%)	High (event source bindings are provider-specific)	Low (OCI standard, portable across clouds)	Low (standard OS, portable)
Team size required (5%)	1-2 developers	3-5 with platform engineer	5+ with SRE/ops

Architecture Diagram

Why It Exists

How It Works

Execution Model

Cold Start Mechanics

Concurrency Model

Cost Crossover Analysis

Architectural Patterns

Production Considerations

Failure Scenarios

Capacity Planning

Architecture Decision Record

ADR: Serverless vs Containers vs VMs

Key Points

Tool Comparison

Common Mistakes

Related Topics

Serverless & FaaS

Architecture Diagram

Why It Exists

How It Works

Execution Model

Cold Start Mechanics

Concurrency Model

Cost Crossover Analysis

Architectural Patterns

Production Considerations

Failure Scenarios

Capacity Planning

Architecture Decision Record

ADR: Serverless vs Containers vs VMs

Key Points

Tool Comparison

Common Mistakes

Related Topics