AI Cost & Unit Economics — Cost & FinOps
Difficulty: Advanced. Audience: VP of Engineering.
Key Points for AI Cost & Unit Economics
- AI unit economics differ fundamentally from traditional SaaS. Inference costs scale linearly with usage, not logarithmically like most infrastructure.
- Track cost per inference, cost per AI-enabled feature, and cost per user. These three metrics give you the full picture from infrastructure to business.
- Model selection is an economic decision as much as a technical one. A fine-tuned smaller model at 1/15th the cost often outperforms a frontier model for specific tasks.
- Token optimization is the AI equivalent of database query optimization. Reducing prompt length, caching common queries, and batching requests can cut costs 60-80%.
- Build dashboards that connect AI spend directly to business outcomes. 'We spent $45K on inference this month and it resolved 12,000 support tickets' is a defensible number.
Common Mistakes with AI Cost & Unit Economics
- Not tracking AI costs at the feature level. A single monthly AWS bill tells you nothing about which AI features are worth keeping.
- Ignoring the variable cost structure when setting pricing. Traditional SaaS has near-zero marginal cost per user, but AI features have real per-request costs.
- Optimizing only for accuracy without considering cost. A 2% accuracy improvement that triples your inference bill is rarely worth it.
- Failing to forecast how AI costs will grow as your user base grows. Linear cost scaling can destroy margins at scale if you don't plan for it.
Related to AI Cost & Unit Economics
Cloud Cost Optimization, FinOps Practices
AI System Quality & Reliability Metrics — Reliability Metrics
Difficulty: Advanced. Audience: Engineering Manager.
Key Points for AI System Quality & Reliability Metrics
- Traditional reliability metrics like uptime, latency, and error rate are necessary but insufficient for AI systems. Your service can be 100% available while producing wrong answers.
- Define SLOs for AI quality: accuracy thresholds, hallucination rates, and confidence calibration. These deserve the same rigor as your infrastructure SLOs.
- Data drift monitoring is the leading indicator of quality degradation. By the time accuracy drops, the underlying data distribution has already shifted.
- Human evaluation sampling is essential and should happen weekly. Automated metrics catch known failure modes, but humans catch the ones you haven't thought of yet.
- AI system reliability is the product of three factors: infrastructure reliability, data quality, and model quality. A weakness in any one of them brings down the whole system.
Common Mistakes with AI System Quality & Reliability Metrics
- Only monitoring infrastructure metrics while the AI layer quietly produces wrong answers. A 200 OK response that contains a hallucinated answer is worse than a 500 error.
- Using only offline evaluation metrics without monitoring production performance. A model that scores 95% on your test set can score 80% on real traffic.
- Not establishing quality baselines before deploying a new model. Without a baseline, you cannot tell whether a new version is better or worse.
- Setting accuracy targets without understanding the business impact of different error types. A false positive in fraud detection has a very different cost than a false negative.
Related to AI System Quality & Reliability Metrics
SLO, SLA & SLI Budgeting, DORA Metrics Deep Dive
Cloud Cost Optimization — Cost & FinOps
Difficulty: Intermediate. Audience: VP of Engineering.
Key Points for Cloud Cost Optimization
- Compute typically accounts for 60-70% of cloud spend — right-sizing instances is the highest-leverage optimization
- Reserved Instances and Savings Plans can reduce compute costs by 30-60% with 1-3 year commitments
- Spot/preemptible instances offer 60-90% discounts for fault-tolerant workloads like batch processing and CI/CD
- Storage lifecycle policies automatically move infrequently accessed data to cheaper tiers, saving 40-80%
- Cost allocation tags are foundational — you cannot optimize what you cannot attribute to a team or service
Common Mistakes with Cloud Cost Optimization
- Over-provisioning resources out of fear — most production instances run at 10-20% CPU utilization
- Buying Reserved Instances before right-sizing, locking in waste for 1-3 years
- Ignoring data transfer costs, which can silently become 10-15% of your total bill
- Treating cost optimization as a one-time project instead of an ongoing practice
Related to Cloud Cost Optimization
FinOps Practices, SLO, SLA & SLI Budgeting
DORA Metrics Deep Dive — Delivery Metrics
Difficulty: Intermediate. Audience: Engineering Manager.
Key Points for DORA Metrics Deep Dive
- Four key metrics: deployment frequency, lead time for changes, change failure rate, time to restore
- Elite performers deploy on demand with <1 hour lead time and <5% change failure rate
- DORA metrics measure team capability, not individual performance
- Improving deployment frequency usually improves all four metrics simultaneously
- Measure trends over time, not absolute values — context matters more than benchmarks
Common Mistakes with DORA Metrics Deep Dive
- Using DORA metrics to compare unrelated teams with different contexts and codebases
- Optimizing for deployment frequency without investing in automated testing
- Measuring at the organization level instead of the team level where it's actionable
- Treating DORA as a goal rather than a diagnostic tool
Related to DORA Metrics Deep Dive
SPACE Framework, Engineering Productivity Measurement
Engineering Productivity Measurement — Productivity Measurement
Difficulty: Expert. Audience: CTO.
Key Points for Engineering Productivity Measurement
- Developer productivity is multidimensional — no single metric captures it, and attempting to creates perverse incentives
- Combine system metrics (CI/CD data, code review stats) with developer surveys (satisfaction, friction points) for a complete picture
- Proxy measures like PR cycle time and build reliability correlate with productivity but do not define it
- McKinsey's 2023 developer productivity framework was widely criticized for over-indexing on activity metrics
- The best productivity investment is usually removing friction (faster builds, fewer meetings, better tooling) rather than measuring output
Common Mistakes with Engineering Productivity Measurement
- Measuring lines of code, commit counts, or story points as productivity indicators — all are trivially gameable
- Building elaborate dashboards before understanding what questions you are trying to answer
- Comparing productivity across teams without accounting for codebase age, technical debt, and domain complexity
- Treating developer experience improvements as overhead rather than productivity multipliers
Related to Engineering Productivity Measurement
DORA Metrics Deep Dive, SPACE Framework
FinOps Practices — Cost & FinOps
Difficulty: Advanced. Audience: VP of Engineering.
Key Points for FinOps Practices
- FinOps is a cultural practice, not a tool — it makes cost a first-class engineering concern alongside performance and reliability
- Chargeback/showback models attribute cloud spend to the teams consuming it, creating accountability
- Unit economics (cost per transaction, cost per user) are more actionable than raw spend numbers
- FinOps maturity progresses through Inform, Optimize, and Operate phases — crawl before you run
- Cross-functional FinOps teams include engineering, finance, and product to balance cost against business value
Common Mistakes with FinOps Practices
- Making FinOps purely a finance initiative — without engineering ownership, cost optimization recommendations get ignored
- Implementing chargeback without giving teams the tooling or autonomy to actually reduce their costs
- Focusing only on rate optimization (reservations, discounts) while ignoring usage optimization (right-sizing, waste elimination)
- Setting cost reduction targets without connecting them to business metrics — saving money by degrading user experience is not optimization
Related to FinOps Practices
Cloud Cost Optimization, Engineering Productivity Measurement
SLO, SLA & SLI Budgeting — Reliability Metrics
Difficulty: Advanced. Audience: Platform Team.
Key Points for SLO, SLA & SLI Budgeting
- SLIs are the measurements, SLOs are the targets, SLAs are the contracts — do not confuse them
- Error budgets quantify how much unreliability you can tolerate before pausing feature work
- A 99.9% SLO allows 43.2 minutes of downtime per month — know your budget in real time
- Burn rate alerts detect when you are consuming error budget faster than expected
- SLOs should be set based on user expectations, not on what your system currently achieves
Common Mistakes with SLO, SLA & SLI Budgeting
- Setting SLOs at 99.99% when your users would be perfectly happy with 99.9% — wasting engineering effort
- Defining SLIs that do not reflect actual user experience (measuring server uptime instead of request success rate)
- Having SLOs without error budget policies — the budget is meaningless if nobody acts when it is exhausted
- Treating SLAs and SLOs as the same thing — SLAs have financial penalties, SLOs are internal targets
Related to SLO, SLA & SLI Budgeting
DORA Metrics Deep Dive, Cloud Cost Optimization
SPACE Framework — Productivity Measurement
Difficulty: Advanced. Audience: VP of Engineering.
Key Points for SPACE Framework
- Five dimensions: Satisfaction & well-being, Performance, Activity, Communication & collaboration, Efficiency & flow
- No single metric captures developer productivity — SPACE requires measuring across multiple dimensions
- Satisfaction surveys are a leading indicator; declining satisfaction predicts future attrition and velocity drops
- Activity metrics (PRs, commits) are only valid when combined with outcome metrics to avoid Goodhart's Law
- Developed by Nicole Forsgren, Margaret-Anne Storey, and others at Microsoft Research and GitHub
Common Mistakes with SPACE Framework
- Cherry-picking only the Activity dimension because it is easiest to measure automatically
- Running satisfaction surveys but never acting on the results, creating survey fatigue
- Measuring individual developers instead of teams — SPACE explicitly warns against this
- Treating SPACE as a replacement for DORA instead of a complementary framework
Related to SPACE Framework
DORA Metrics Deep Dive, Engineering Productivity Measurement