Code reviews are often used as a way to make sure that bad code doesn’t make it into public releases. Not all code reviews are equal for that purpose however, as this study by McIntosh, Kamei, Adams, and Hassan shows.
The following summary is loosely based on the second half of Code reviewing reviewed: recommendations for improving the efficiency and effectiveness of modern code reviews, an essay that I originally wrote for the Open University of the Netherlands.
Why it matters
It is generally believed that conducting code reviews can help catch bugs before they make it to users. Plenty of evidence exists that can be very effective. Less is known about the effectiveness of modern code reviews however, where reviewers can – but do not necessarily have to – comment on the author’s code. It’s likely though that bugs are overlooked if not all code is properly reviewed or discussed.
How the study was conducted
There are a few important concepts in this study:
Code: here, this specifically refers to new code that is introduced in a newly released version;
Code review coverage: the proportion of code that is associated with a code review;
Code review participation: it’s possible that code for which a review has been requested was approved only by the author, hastily reviewed, or approved by reviewers without any discussion whatsoever. Participation is the proportion of code for which the reviews did not have these characteristics.
Post-release defects: the number of bugs that pop up after a newly released version.
In order to determine the effect of the code review coverage and code review participation on the number of post-release defects, the authors combine data from the version control system and code review tools from three open source projects: Qt, VTK, and ITK.
The version control system shows which commits were included in each release, while the code review tool stores links between commits and reviews. This makes it possible to determine how well the code for a release was reviewed.
The number of post-release defects can be deduced from the number of .
Reviews aren’t the only factor that is known to influence the number of bugs. That’s why a number of commonly used software quality metrics (e.g. component size in lines of code (LOC) and cyclomatic complexity) are included as control metrics.
The review, bug, and control metrics are used to construct a model that predicts the number of expected post-release defects based on how thoroughly the code was reviewed.
What discoveries were made
The results are pretty much exactly what you’d expect: having high degrees of code review coverage and participation helps lower the number of post-release defects.
For code review coverage, the authors found that with a coverage below 29% at least one bug is to be expected, although for one of the projects even a coverage below 60% already results into least one bug. Of course, full coverage does not guarantee defect-free software: there are other factors (not within the scope of the study) that also affect the number of bugs after a new release.
The findings were much more conclusive for code review participation: components that have a high level of participation tend to have a lower number of bugs after a release. The opposite is also true: components with a low level of participation tend to have higher number of bugs.
To minimise the number of post-release defects, all code should be
carefully reviewed (quality over quantity!)
discussed by and with reviewers, and
approved by a reviewer (i.e. someone who is not the author)
before it is included in a new release.