The impact of code review coverage and code review participation on software quality (2014)

Turrets being accepted or rejected on an assembly line

Code reviews are often used as a way to make sure that bad code doesn’t make it into public releases. Not all code reviews are equal for that purpose however, as this study by McIntosh, Kamei, Adams, and Hassan shows.

The following summary is loosely based on Code reviewing reviewed: recommendations for improving the efficiency and effectiveness of modern code reviews, an essay that I originally wrote for the Open University of the Netherlands.

Why it matters

It is generally believed that conducting code reviews can help catch bugs before they make it to users. Plenty of evidence exists that formal code inspectionsFormal code inspection used to be a popular code reviewing method, and involved in-person meetings between the author and the reviewer(s). Checklists were often employed to guarantee a base level of review quality. can be very effective. Less is known about the effectiveness of modern code reviews however, where reviewers can – but do not necessarily have to – comment on the author’s code. It’s likely though that bugs are overlooked if not all code is properly reviewed or discussed.

How the study was conducted

There are a few important concepts in this study:

In order to determine the effect of the code review coverage and code review participation on the number of post-release defects, the authors combine data from the version control system and code review tools from three open source projects: Qt, VTK, and ITK.

The version control system shows which commits were included in each release, while the code review tool stores links between commits and reviews. This makes it possible to determine how well the code for a release was reviewed.

The number of post-release defects can be deduced from the number of bug fix commits (commits with a message containing words like “bug” or “fix”) that are made after the releaseThis approach possibly results in some false positives (e.g. fixes for bugs that were introduced in earlier versions), but it’s probably still a decent approximation.

Reviews aren’t the only factor that is known to influence the number of bugs. That’s why a number of commonly used software quality metrics (e.g. component size in LOC and cyclomatic complexity) are included as control metrics.

The review, bug, and control metrics are used to construct a model that predicts the number of expected post-release defects based on how thoroughly the code was reviewed.

What discoveries were made

The results are pretty much exactly what you’d expect: having high degrees of code review coverage and participation helps lower the number of post-release defects.

For code review coverage, the authors found that with a coverage below 29% at least one bug is to be expected, although for one of the projects even a coverage below 60% already results into least one bug. Of course, full coverage does not guarantee defect-free software: there are other factors (not within the scope of the study) that also affect the number of bugs after a new release.

The findings were much more conclusive for code review participation: components that have a high level of participation tend to have a lower number of bugs after a release. The opposite is also true: components with a low level of participation tend to have higher number of bugs.

The important bits

To minimise the number of post-release defects, all code should be

  1. carefully reviewed (quality over quantity!)
  2. discussed by and with reviewers, and
  3. approved by a reviewer (i.e. someone who is not the author)

before it is included in a new release.