When testing meets code review: Why and how developers review tests (2018)

Construction worker pretends to check the work of his colleague
Do developers detest reviewing the work of their peers?

Most developers know understand the value of tests. Most developers I know also understand the value of code reviews. Combining those two is a lot harder though, as evidenced by all our JIRA tickets that seem to linger in the “Review” column every sprint. What makes it so hard to review test code?

Why it matters

It’s quite hard to write code that does exactly what it’s supposed to do. Yes, it gets slightly easier after years of practice, but you’ll still make mistakes. Tons of them.

This is why we write tests for our code: tests help us catch mistakes during development, help your fellow developersWhich includes future you! understand how your code works and ensure that future changes don’t break existing code.

Tests are not a panacea however. Test code is code too, so it’s just as likely to contain bugs. A 2015 study on bugs in test code suggested that up to half of all projects may contain buggy test code!

Code reviews are a simple and effective way to catch a lot of these bugs before they make it into production – or at least, that’s what we assume. Is test code reviewed as rigorously as production code?

How the study was conducted

The study consists of two phases.

In the first phase the authors analyse more than 300,000 reviews and commits from three different projects (Eclipse, OpenStack, and Qt) to gain insight into what and how developers review code with tests. All three projects make use of automated testing and review their code extensively using the Gerrit code review tool.

After that, 12 interviews were conducted to gain more insight into how developers review test code, and what makes reviewing such code difficult. Interviewees were sourced from a combination of open-source projects and industrial (closed-source) projects.

What discoveries were made

Test code does seem to reviewed as rigorously as production code – it’s even an absolute necessity.

Rigorous reviews

At first sight it seems that developers review production code more rigorously than test code: when production and test code are reviewed together, production code is more likely to get comments than test code. This is possibly because test code is less complex and therefore simply doesn’t need as many comments.

However, when reviews are only for one type of file (so either production code or test code), then reviews for test code are actually more likely to attract comments than those for production code.

The file type in reviews does not seem to have a large effect on the number and length of comments, and the number of reviewers that comment on code.

Why reviewers comment

The authors manually classified a sample of 600 review comments based on their intended purpose:

Strategies

All interviewees start their reviews by reading the commit message or some other documentation that states the intent of the change. This gives the reviewer a sense of the areas of the system that are likely to be changed and what type of solution the proposed patch will implement.

Once it’s clear what the committer aims to achieve, reviewers typically use two different reading strategies:

Many interviewees say that they also look at files that are not included in the proposed change by checking out the code under review in their own IDE. This allows them to see and understand the entire context in which the change happens.

Challenges

The interviews also revealed a number of other challenges that are commonly faced by reviewers:

The important bits

  1. Proposed changes that only contain test files are more likely to receive comments than changes that also include production code
  2. Developers who review test code are mostly interested in tested/untested paths and testing practices
  3. Code reviews can help you detect severe issues in test code
  4. Some reviewers review tests before production code, but most prefer a code-first approach
  5. Tests are harder to review than production code. Some find it harder to even justify reviewing test code