When testing meets code review: Why and how developers review tests
Most developers know understand the value of tests. Most developers I know also understand the value of code reviews. Combining those two is a lot harder though, as evidenced by all our JIRA tickets that seem to linger in the “Review” column every sprint. What makes it so hard to review test code?
It’s quite hard to write code that does exactly what it’s supposed to do. Yes, it gets slightly easier after years of practice, but you’ll still make mistakes. Tons of them.
This is why we write tests for our code: tests help us catch mistakes during development, help your understand how your code works and ensure that future changes don’t break existing code.
Tests are not a panacea however. Test code is code too, so it’s just as likely to contain bugs. A 2015 study on bugs in test code suggested that up to half of all projects may contain buggy test code!
Code reviews are a simple and effective way to catch a lot of these bugs before they make it into production – or at least, that’s what we assume. Is test code reviewed as rigorously as production code?
The study consists of two phases.
In the first phase the authors analyse more than 300,000 reviews and commits from three different projects (Eclipse, OpenStack, and Qt) to gain insight into what and how developers review code with tests. All three projects make use of automated testing and review their code extensively using the Gerrit code review tool.
After that, 12 interviews were conducted to gain more insight into how developers review test code, and what makes reviewing such code difficult. Interviewees were sourced from a combination of open-source projects and industrial (closed-source) projects.
Test code does seem to reviewed as rigorously as production code – it’s even an absolute necessity.
At first sight it seems that developers review production code more rigorously than test code: when production and test code are reviewed together, production code is more likely to get comments than test code. This is possibly because test code is less complex and therefore simply doesn’t need as many comments.
However, when reviews are only for one type of file (so either production code or test code), then reviews for test code are actually more likely to attract comments than those for production code.
The file type in reviews does not seem to have a large effect on the number and length of comments, and the number of reviewers that comment on code.
The authors manually classified a sample of 600 review comments based on their intended purpose:
-
Most comments (35%) are suggestions for code improvements. Comments on production code are mostly about generic code quality aspects like maintainability, but comments on test code are usually about testing practices, e.g. mocking usage, test cohesion, and untested code paths. Comments about unclear naming and readability are also fairly “popular” though;
-
Many comments (32%) are used to ask questions in order to improve understanding of code;
-
About 9% of comments are related to defect finding. In production code such comments often address small and superficial concerns, but in test code 43% of the comments are actually about severe issues;
-
A small portion of comments is placed to transfer knowledge from the reviewer to the committer, or the other way around.
All interviewees start their reviews by reading the commit message or some other documentation that states the intent of the change. This gives the reviewer a sense of the areas of the system that are likely to be changed and what type of solution the proposed patch will implement.
Once it’s clear what the committer aims to achieve, reviewers typically use two different reading strategies:
-
Some developers prefer to read test files first, as it helps them understand the API; they feel that this makes it easier to spot inconsistencies between the tests and the implementation in the production code.
-
Most developers start with the production code however, . Interviewees also claim that this approach is easier, especially if the tests suffer from poor quality.
Many interviewees say that they also look at files that are not included in the proposed change by checking out the code under review in their own IDE. This allows them to see and understand the entire context in which the change happens.
The interviews also revealed a number of other challenges that are commonly faced by reviewers:
-
Test code is harder to review than production code: most (or even all) test code is generally new code that’s going to stick around virtually forever and you need a lot of them to cover all edge and boundary cases. Code review tools aren’t very helpful here, as they make it hard to view tests and production code side-by-side;
-
Developers (and their managers) don’t value tests as much as production code that delivers features. Writing tests is therefore seen as a less important activity. The same attitude is present during reviews: some only verify that test files are present and assume that the code does what it’s supposed to do;
-
There are also a lot of (novice) developers who do understand the value of proper testing and reviewing, but lack the necessary skills.
-
Proposed changes that only contain test files are more likely to receive comments than changes that also include production code
-
Developers who review test code are mostly interested in tested/untested paths and testing practices
-
Code reviews can help you detect severe issues in test code
-
Some reviewers review tests before production code, but most prefer a code-first approach
-
Tests are harder to review than production code. Some find it harder to even justify reviewing test code