How many unit tests should you write?

Published: 25 Jan 2023
Written by: Chun Fei Lung

How many unit tests are enough? As always the answer is “It depends”, but fortunately there is a handy rule of thumb that you can use!

A man walks into a bar and orders a beer. 2 beers. 0 beers. -1 beer. 2,147,483,647 beers. Null beers.

As a software tester your goal is to prove that the code works as expected. Each test you write should try to find a bug in the code in a slightly different way. Once all tests pass, you have successfully failed at finding bugs in the code!

But how many tests should you write? If you write too few tests you might miss a critical bug that crashes the application. On the other hand, writing too many tests is clearly a waste of time. Not only because writing tests costs time, but also because each test takes time to run and requires ongoing maintenance.

There is a rule of thumb that can be used to determine how many tests you generally need to test a unit of code: look at how many meaningfully different kinds of input can be passed to that unit.

That probably sounds a bit vague, but think about what affects the behaviour of a function: the result of a function call is usually determined by its input value(s) (side note: In the case of objects, the object itself – and particularly its state – can be seen as a form of input.). For example, if we have a function double(num: int) that multiplies num by 2, then calling double(16) repeatedly will always yield the same result (32). Passing other nums gives us other results, e.g. double(21) = 42.

We could write two tests that assert whether double(16), double(21), and so on yield the expected results, but that wouldn’t be very useful. To understand why, we need to look at the input space (the set of all possible inputs) for double(num: int), which consists of all valid integers. It’s clearly infeasible to test double() with every value from that input space. Fortunately, it’s also unnecessary.

Subdomains

The input space for double() may be large, but many of those inputs result in similar behaviour. It’s possible to partition (divide) the input space into subdomains, where each subdomain is a set of inputs that result in similar behaviour.

The input space for double() can be partitioned into three subdomains: positive integers (the result is a positive integer), negative integers (the result becomes a positive integer), and zero (the result is always zero). In theory this means that the input space can be covered in its entirety using only three tests!

Boundaries

In practice you might need a few more, because it’s also wise to look at what happens between subdomains. Bugs often occur at boundaries between subdomains due to off-by-one mistakes. Typical examples of boundaries include:

0 (as the boundary between positive and negative values);
maximum and minimum values of numeric types (in languages like Java);
null values, empty strings, and empty lists; and
first and last elements of lists.

Note that it’s not always necessary to check all boundaries. For instance, if the double() function will only ever be used to double the discount on a single fare train ticket, it’s a waste of time to write a test which asserts that double() works correctly for outrageously large integers.

Pick your battles

In a programming language like Java, we could test the double() function using five different inputs that cover the entire input space and its outer boundaries:

num = MIN	num < 0	num = 0	num > 0	num = MAX
✅	✅	✅	✅	✅

But what if we have a function that accepts multiple input arguments? Let’s look at another function, max(a: int, b: int), which returns the largest of the two variables a and b. This function has two parameters. Each has its own input space that we need to consider.

Since the input space of each of the two parameters can be covered entirely using five test inputs, there are 5×5=25 unique combinations that we could test:

	a = MIN	a < 0	a = 0	a > 0	a = MAX
b = MIN	✅	✅	✅	✅	✅
b < 0	✅	✅	✅	✅	✅
b = 0	✅	✅	✅	✅	✅
b > 0	✅	✅	✅	✅	✅
b = MAX	✅	✅	✅	✅	✅

That’s a bit much, isn’t it? (side note: If you’re working on a safety-critical system this is about the right amount.)

We could try to focus on a subset of these combinations that covers each partition from each input space (at least) once. For example, we could try to write tests for the following five cases:

	a = MIN	a < 0	a = 0	a > 0	a = MAX
b = MIN	✅
b < 0		✅
b = 0			✅
b > 0				✅
b = MAX					✅

This subset has a much more manageable size – but also some glaring issues: In three of the tests the two input values are equal to each other (a = b = MIN, a = b = 0, and a = b = MAX). There are also no tests in which a is positive and b is negative (and vice versa).

The latter observation suggests that there is a third way to partition the input space that we hadn’t considered yet: the relationship between parameters. In this case there are three partitions that we would like to cover: a < b, a = b, and a > b.

Here’s another attempt that also takes those three partitions into account:

	a = MIN	a < 0	a = 0	a > 0	a = MAX
b = MIN					✅
b < 0				✅
b = 0			✅
b > 0		✅
b = MAX	✅

Finally, we can also ask ourselves whether it’s really necessary to test all boundaries. If we only use our max() for small-ish numbers, we can omit the tests where a or b are MIN or MAX. This means we only need three tests for a decent coverage!

	a = MIN	a < 0	a = 0	a > 0	a = MAX
b = MIN
b < 0				✅
b = 0			✅
b > 0		✅
b = MAX

How many unit tests should you write?

Subdomains

Boundaries

Pick your battles

More about bikeshedding

More about software testing