How many unit tests should you write?
As a software tester your goal is to prove that the code works as expected. Each test you write should try to find a bug in the code in a slightly different way. Once all tests pass, you have successfully failed at finding bugs in the code!
But how many tests should you write? If you write too few tests you might miss a critical bug that crashes the application. On the other hand, writing too many tests is clearly a waste of time. Not only because writing tests costs time, but also because each test takes time to run and requires ongoing maintenance.
There is a rule of thumb that can be used to determine how many tests you generally need to test a unit of code: look at how many meaningfully different kinds of input can be passed to that unit.
That probably sounds a bit vague, but think about what affects the behaviour of
a function: the result of a function call is usually determined by .
For example, if we have a function double(num: int)
that multiplies num
by 2,
then calling double(16)
repeatedly will always yield the same result (32
).
Passing other num
s gives us other results, e.g. double(21)
= 42
.
We could write two tests that assert whether double(16)
, double(21)
, and so
on yield the expected results, but that wouldn’t be very useful. To understand
why, we need to look at the input space (the set of all possible inputs) for
double(num: int)
, which consists of all valid integers. It’s clearly
infeasible to test double()
with every value from that input space.
Fortunately, it’s also unnecessary.
The input space for double()
may be large, but many of those inputs result in
similar behaviour. It’s possible to partition (divide) the input space into
subdomains, where each subdomain is a set of inputs that result in similar
behaviour.
The input space for double()
can be partitioned into three subdomains:
positive integers (the result is a positive integer), negative integers (the
result becomes a positive integer), and zero (the result is always zero). In
theory this means that the input space can be covered in its entirety using only
three tests!
In practice you might need a few more, because it’s also wise to look at what happens between subdomains. Bugs often occur at boundaries between subdomains due to off-by-one mistakes. Typical examples of boundaries include:
-
0
(as the boundary between positive and negative values); -
maximum and minimum values of numeric types (in languages like Java);
-
null
values, empty strings, and empty lists; and -
first and last elements of lists.
Note that it’s not always necessary to check all boundaries. For instance, if
the double()
function will only ever be used to double the discount on a
single fare train ticket, it’s a waste of time to write a test which asserts
that double()
works correctly for outrageously large integers.
In a programming language like Java, we could test the double()
function using
five different inputs that cover the entire input space and its outer boundaries:
num = MIN | num < 0 | num = 0 | num > 0 | num = MAX |
---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ |
But what if we have a function that accepts multiple input arguments? Let’s look
at another function, max(a: int, b: int)
, which returns the largest of the two
variables a
and b
. This function has two parameters. Each has its own input
space that we need to consider.
Since the input space of each of the two parameters can be covered entirely using five test inputs, there are 5×5=25 unique combinations that we could test:
a = MIN | a < 0 | a = 0 | a > 0 | a = MAX | |
---|---|---|---|---|---|
b = MIN | ✅ | ✅ | ✅ | ✅ | ✅ |
b < 0 | ✅ | ✅ | ✅ | ✅ | ✅ |
b = 0 | ✅ | ✅ | ✅ | ✅ | ✅ |
b > 0 | ✅ | ✅ | ✅ | ✅ | ✅ |
b = MAX | ✅ | ✅ | ✅ | ✅ | ✅ |
We could try to focus on a subset of these combinations that covers each partition from each input space (at least) once. For example, we could try to write tests for the following five cases:
a = MIN | a < 0 | a = 0 | a > 0 | a = MAX | |
---|---|---|---|---|---|
b = MIN | ✅ | ||||
b < 0 | ✅ | ||||
b = 0 | ✅ | ||||
b > 0 | ✅ | ||||
b = MAX | ✅ |
This subset has a much more manageable size – but also some glaring issues: In
three of the tests the two input values are equal to each other (a = b = MIN,
a = b = 0, and a = b = MAX). There are also no tests in which a
is positive
and b
is negative (and vice versa).
The latter observation suggests that there is a third way to partition the input space that we hadn’t considered yet: the relationship between parameters. In this case there are three partitions that we would like to cover: a < b, a = b, and a > b.
Here’s another attempt that also takes those three partitions into account:
a = MIN | a < 0 | a = 0 | a > 0 | a = MAX | |
---|---|---|---|---|---|
b = MIN | ✅ | ||||
b < 0 | ✅ | ||||
b = 0 | ✅ | ||||
b > 0 | ✅ | ||||
b = MAX | ✅ |
Finally, we can also ask ourselves whether it’s really necessary to test all
boundaries. If we only use our max()
for small-ish numbers, we can omit the
tests where a
or b
are MIN
or MAX
. This means we only need three tests
for a decent coverage!
a = MIN | a < 0 | a = 0 | a > 0 | a = MAX | |
---|---|---|---|---|---|
b = MIN | |||||
b < 0 | ✅ | ||||
b = 0 | ✅ | ||||
b > 0 | ✅ | ||||
b = MAX |