Capacity Planning Metrics
Establishing Utilization Baselines
You cannot plan capacity without knowing your current consumption. Measure CPU, memory, disk I/O, and network across all services, at the P95 percentile over a rolling 30-day window. P95 captures your real peak usage without being distorted by one-off spikes.
Break baselines down by service, not just by cluster or account. Two services on the same cluster might have opposite utilization profiles. Service A runs at 70% CPU during business hours and 10% at night. Service B runs batch jobs at 90% CPU overnight and sits idle during the day. Cluster-level averages would show 40-50% and look fine, masking the fact that both services are under-provisioned for their peak.
Headroom and Buffer Calculations
The standard formula: provision for P95 peak utilization plus a 30% buffer. That buffer handles unexpected traffic spikes, the time it takes to scale up (if auto-scaling), and degradation during incident response. For services with auto-scaling, the buffer accounts for the scale-up lag (typically 2-5 minutes for cloud VMs, 30-60 seconds for containers).
If your P95 CPU utilization is 65%, provision for roughly 85% of total capacity. That gives you the 30% buffer relative to your peak. For memory, be more conservative. Memory pressure causes more catastrophic failures than CPU pressure because out-of-memory kills are sudden while CPU saturation degrades gradually.
Capacity Cliff Detection
A capacity cliff is the point where adding more load causes disproportionate degradation. Databases hit cliffs when connection pools max out. Load balancers hit cliffs when they can't distribute fast enough. Network links hit cliffs when bandwidth saturates and packet loss starts.
Identify your binding constraint for each service. Run load tests regularly (monthly for critical services) and record the inflection point where latency starts climbing non-linearly. Your capacity plan should ensure you never operate above 70% of that inflection point in production.
Forecasting Growth
Pull 6-12 months of historical traffic data and fit a trend. Linear growth is the simplest model and works for mature products. If your traffic is growing 8% month-over-month, extrapolate forward and calculate when you'll hit your capacity cliff at that rate.
But watch for non-linear patterns. Seasonal products spike around holidays. B2B products spike around end-of-quarter. Marketing campaigns create step-function jumps. Your forecast model needs to account for these patterns, or you'll be scrambling to provision capacity at the worst possible time.
Cost Per Transaction
Raw infrastructure cost is meaningless without context. A cluster that costs $50,000/month serving 100 million requests has a cost per request of $0.0005. That same cost serving 1 million requests is $0.05 per request, 100x more expensive per unit of work.
Track cost per transaction (or cost per request, or cost per active user) over time. This metric normalizes spend against usage and reveals efficiency trends. If cost per transaction is rising while traffic grows, your infrastructure isn't scaling efficiently. If it's falling, your investments in optimization and rightsizing are paying off.
Rightsizing as Continuous Practice
Cloud provider tools (AWS Compute Optimizer, GCP Recommender, Azure Advisor) analyze utilization data and suggest instance type changes. These recommendations typically save 20-40% on compute spend with no performance impact, because most instances are over-provisioned from initial sizing.
Make rightsizing a monthly review, not a one-time project. Usage patterns change as features evolve. An instance that was correctly sized six months ago might be running at 15% utilization now because the workload shifted to a different service. Automate the recommendation pipeline and put rightsizing actions into a regular sprint cadence.
Key Points
- •Resource utilization baselines (CPU, memory, storage) measured at P95 over 30 days give you the true usage picture
- •Headroom calculation: provision for P95 usage + 30% buffer to handle traffic spikes without degradation
- •Cost per transaction normalizes infrastructure spend against actual business value delivered
- •Capacity cliffs happen when a single resource hits its limit; identify the binding constraint before it breaks
- •Rightsizing recommendations based on utilization data typically save 20-40% on compute spend
Common Mistakes
- ✗Planning capacity based on average utilization instead of peak utilization, which causes outages during traffic spikes
- ✗Forecasting growth linearly when your traffic pattern is actually seasonal or event-driven
- ✗Over-provisioning everything by 3x 'just in case' without tracking actual utilization to validate the buffer
- ✗Treating capacity planning as a quarterly exercise instead of continuous monitoring with automated alerts