Practitioners’ views on good software testing practices (2019)
Good developers write good automated test cases for their work before handing it over to others or their future selves. Most of us have strong opinions on what makes automated test cases good, but let’s see what other people think!
Why it matters
Each time you fix a bug or add some new functionality, you (should) write tests. This gives you and other stakeholders enough confidence that the software (still) meets all requirements and helps future developersWhich includes yourself! understand what your code does.
But tests are not a silver bullet. There’ll still be bugs, and as the software grows in size you may find that it becomes quite a chore to maintain all your test cases.
People have come up with all kinds of best practices to mitigate these issues. The goal of the study is therefore to learn what developers consider to be good characteristics for test cases.
How the study was conducted
The study consists of two parts:
- Interviews with 5 top Apache contributors and 16 professional developers from Hengtian, a major Chinese software outsourcing company. This leads to a healthy mix of 29 hypotheses about testing practices from open-source and closed-source developers.
- A validation survey with 261 respondents. Respondents were recruited from professionals in the authors’ personal networks and contributors to popular projects on GitHub. Each respondent was asked to rate hypotheses on a scale from 1 (strongly disagree) to 5 (strongly agree) and provide rationales for a few hypotheses.
What discoveries were made
The table below lists the 29 hypotheses and their average ratings:
|A good test case is specific or atomic, i.e. one test case should be testing one aspect of a requirement||3.93|
|Test cases in a test suite should be self-contained, i.e. independent of one another||3.95|
|Good test cases should check for normal and exceptional flow||4.47 ⭐️|
|Test cases must perform boundary value analysis, i.e. take as input values at the extreme ends of an input domain||4.24 ⭐️|
|Test cases should serve as a good reference documentation||3.93|
|Most test cases should be small in size (in terms of its lines of code)||3.85|
|Large test cases are often hard to understand and maintain||3.73|
|Large test cases may be needed to detect difficult bugs||3.59 🔥|
|A good suite contains lots of small test cases (with fewer LOC) and few large test cases||3.97|
|Increased complexity in a test case can lead to bugs in the test code itself||4.04 ⭐️|
|Code coverage is necessary but not sufficient||3.97|
|Code coverage should be used to understand what is missing in the tests and create tests based on that||3.97|
|Higher coverage does not mean that a test suite can detect more bugs||4.02 ⭐️|
|Each test case should have a small footprint, i.e. the amount of code it executes||3.92|
|A test case that is designed to maximize coverage is often long, not understandable and brittle (i.e. breaks easily)||3.50 🔥|
|Designing test cases to cover different requirements is often more important than designing test cases to cover more code||4.00 ⭐️|
|A good test case should be well-modularized||4.62 ⭐️|
|A good test case should be readable and understandable||4.58 ⭐️|
|Test cases should be simpler than the code being tested||4.20 ⭐️|
|Test code should be designed with maintainability in mind since evolution of code often requires changing of test code||4.16 ⭐️|
|Traceability links should be maintained between test code, requirements, and source code||3.97|
|A good test case should attempt to break functionality to find potential bugs||4.11 ⭐️|
|Test even the simplest things that cannot go wrong||3.89|
|During maintenance, when a bug is fixed, it is good to add a test case that covers it||4.40 ⭐️|
|Test assertions can help detect subtle errors that might otherwise go undetected||4.51 ⭐️|
|Adding common errors and possible causes as comments in test code is helpful to debug failures||3.98|
|A good test case should be designed such that its results are deterministic||4.07 ⭐️|
|Test cases in a test suite should not have side effects so running a test before or after another should not change the results||4.28 ⭐️|
|Test cases should use tags or categories, such as slow tests, fast tests etc., so as to be able to run a specific set of tests easily at a time||3.93|
14 hypotheses have a rating of 4.00 or higherI’ve annotated these with a ⭐️., and can be safely incorporated in best-practice checklists without heated debates.
While no hypotheses were rejected outright by respondents, there are two that clearly led to mixed responsesThese have been annotated with a 🔥.:
People generally agree that the best test cases are small. But some bugs only occur during very specific circumstances which may take a lot of effort (or lines of code) to recreate.
Some believe that large test cases are justified in these cases, but others argue that poor understandability of such test cases makes them less effective in the long run. Instead, better decomposition of the problem may make it possible to write a smaller test.
Test cases that maximise coverage are likely to be long, hard to understand, and brittle. Not everyone believes this to be the case, as one can often write tests that are small, but still cover a lot of codeJust because you can doesn’t mean you should though….
Universal law is for lackeys – context is for kings
Keep in mind that best practices are not laws and do not necessarily apply in all situations.
The important bits
This is my personal interpretation of the results: