Does it really matter to test-first or to test-last? (2017)
Test-driven development is a development practice that involves short, iterative cycles in which the programmer writes tests before adding new functionality or refactoring existing code. It’s commonly believed that writing tests first leads to higher-quality code and improved productivity. Fucci et al. put that belief to the test.
Why it matters
Test-driven development (TDD) has multiple characteristics that set it apart from “traditional” programming, but the “tests first, code later” aspect tends to be the thing that most people talk about (and remember).
There’s more to it than that however, so let’s talk definitions first.
TDD is an programming technique which involves cyclic, iterative implementation of new features.
In each cycle a programmer carries out the following tasks:
- Writing unit tests for the desired behaviour;
- Writing code to make those tests pass;
- Strictly refactoring code to improve its design, i.e. without modifying its behaviourDoing so could nullify or even reverse the benefits of refactoring.
A cycle is finished when all new and existing unit tests pass, and the programmer is content with the program’s design. Ideally, all cycles are short and roughly the same lengthCycles should be around 5 minutes long, and never be longer than 10 minutes..
TDD advocates claim that adherence to these practices will lead to improved quality and productivity.
In a nutshell, TDD has four characteristics:
- The sequence in which tests are written; before or after coding
- The granularity of cyclesLength of cycles
- The uniformity of cycle lengths
- The amount of effort spent on refactoring
How do these four characteristics affect the external quality“Does the software do what it’s supposed to do?” of the produced software and the developer’s productivity?
How the study was conducted
The authors held several five-day workshops about unit testing and TDD at two Nordic companies.
During the workshop, participants were asked to individually implement three tasks, of which two were greenfieldImplementing a solution from scratch and one was brownfieldExtending an existing system. Some participants made use of a test-first sequence, while others used a test-last sequence.
TDD dictates that development is done iteratively using many short cycles. To help participants work on their tasks in small steps, the researchers refined each task into clearly delineated stories and sub-stories. Tasks were then “graded” using acceptance test suites for each user story in order to determine the quality of submitted solutions.
All participants made use of a special Eclipse IDE that collected information about actions that are performed in it, like:
- Code modification
- Test modification
- Code compilation
- Test execution
This information is used to determine how participants applied TDD.
Combining timestamps from the IDE logs with the pass rate of the acceptance test suite allows one to calculate the productivity of each developer.
What discoveries were made
You’ve probably already guessed that Betteridge’s law of headlines“Any headline that ends in a question mark can be answered by the word no.” strikes again, but how exactly?
Granularity and uniformity are positively correlated, i.e. developers who use shorter cycles are able to keep them consistently short, while those who use larger cycles tend to have cycles of varying lengths. Both factors also appear to affect external quality: smaller cycles and cycles that have consistent lengths are associated with better external quality.
A small, but statistically significant correlation exists between granularity and refactoring effort: developers who use coarser cycles spend less time on refactoring.
To better understand the relation between TDD’s four characteristic factors and the two outcome variables (quality and productivity), the authors constructed two models.
The basic idea here is that each model should predict one of the outcome variables using information about the code-test sequence, cycle granularity and uniformity, and refactoring effort.
A good model is also simple, and should not include superfluous input variables. The process of trimming these variables, feature selection, is described in the original article.
I’ll simply list the most noteworthy discoveries here:
Code-test sequence is not part of either model, which suggests that – at least for external quality and developer productivity – it does not matter whether you write your tests before or after your “real” codeThis study did not look at the effects on internal quality (i.e. maintainability), which is also pretty important.;
Cycle granularity and uniformity, and refactoring effort are all negatively correlated with both quality and productivity.
The negative correlation between refactoring effort and the two outcome variables is likely due to floss refactoringThis is a form of refactoring that also includes other activities, like implementation of new features. These new features might not be covered by tests and are therefore more likely to introduce regression bugs..