Code Review Metrics
Why Review Metrics Matter
Code review is where most teams leak lead time. You can have a blazing CI/CD pipeline, but if PRs sit in a queue for two days waiting on a reviewer, none of that matters. Measuring the review process gives you visibility into a bottleneck that most teams feel but never quantify.
The core metrics to track: time-to-first-review (how long before someone looks at the PR), review cycle time (open to merge), number of review rounds, and reviewer load distribution. These four together paint a picture of your review process health.
What Good Looks Like
Google's engineering practices research found that PRs under 400 lines of changed code get reviewed 40% faster and produce fewer post-merge defects. That number keeps showing up in other studies too. When a PR crosses 400 lines, reviewers start skimming. They miss things. Defect escape rate goes up.
Time-to-first-review should be under 4 hours for most teams. Under 2 hours is excellent. If you're consistently above 8 hours, you have a structural problem, usually too few reviewers or PRs that are too large for anyone to pick up quickly.
Review cycle time (from opening the PR to merging) should track closely with your DORA lead time metric. For teams targeting elite performance, aim for same-day merge on most PRs. Two rounds of review is typical. If you're averaging three or more rounds, your team may need clearer coding standards or better upfront design discussions.
Reviewer Load Balancing
Pull the data on who reviews what. In most teams, 20% of engineers handle 80% of reviews. That concentration creates bottlenecks and burns people out. Use round-robin assignment tools (GitHub's CODEOWNERS with team rotation, or tools like PullApprove) to spread the load.
Track reviews-per-engineer-per-week. If someone is doing more than 10 substantial reviews a week on top of their own work, they're spending 30-40% of their time reviewing. That might be fine for a tech lead, but it's not sustainable for an IC who also has feature work.
Automated vs Human Review
Linters, formatters, type checkers, and security scanners should catch everything mechanical. When 30% of review comments are about formatting or import ordering, that's a sign your CI pipeline has gaps. Every nitpick comment that could have been automated is wasted reviewer attention.
Measure the ratio of automated findings to human findings over time. As you improve your tooling, human reviewers should be spending more of their time on architecture decisions, edge case handling, and knowledge transfer. That's the kind of review that actually prevents production incidents.
Building Review Culture
Dashboards help, but don't put individual review speed on a leaderboard. That creates pressure to approve without reading. Instead, share team-level trends in retros. Celebrate improvements in cycle time. When someone writes a particularly thorough review that catches a real issue, call that out. The goal is a culture where review is valued work, not an interruption.
Key Points
- •Time-to-first-review is the single highest-leverage metric for unblocking developer flow
- •Google's research shows PRs under 400 lines get reviewed faster and have fewer defects
- •Reviewer load balancing prevents bottlenecks and reduces burnout on senior engineers
- •Review cycle time (open to merge) is a strong predictor of overall lead time for changes
- •Automated checks should handle style and formatting so human reviewers focus on logic and design
Common Mistakes
- ✗Mandating review speed targets without addressing root causes like PR size or reviewer capacity
- ✗Counting approvals without measuring review depth, which rewards rubber-stamping
- ✗Ignoring reviewer load distribution, letting the same two people review everything
- ✗Treating review time as idle time rather than recognizing it as skilled engineering work