I wrote about the Arrange-Act-Assert pattern a few weeks ago, which is a standardised approach to formatting unit tests and functional tests. But there is more to writing tests than just formatting. Let’s talk about how we test our code.
Some tests can be very simple. For example, if we have a Python function that returns a random number:
Then we could write a test that verifies whether that function returns the expected number:
Of course, most real-world code is a bit more complex. Functions can be called with different arguments or may themselves call other functions.
Let’s look at a slightly lengthier example; a
NosClient class that finds the
title for articles on Dutch news website NOS.nl for any given article ID. It
does this by fetching the page using the
requests library and extracting the
value of the
NosClient has one public method,
find_title_for_article(). How should
we go about testing this method?
There are two major schools of thought when it comes to how tests should be written: . Both have their strengths and weaknesses, and neither is strictly better than the other.
The classicist or Chicago school is the oldest of the two, and the easiest to understand. According to the Chicago school tests should check if a system under test (SUT) produces the expected results, which can be done by simply letting it do its thing and checking its output.
In the case of
NosClient.find_title_for_article(), we could call the method
with an article
id and assert that it returns the title of that article:
Chicago-style tests are kind of like a teacher grading an essay. They only get to see the end result and that’s all they care about. , as long as it meets the requirements.
This is because good essays can only be written by those who have done all the necessary work (developing writing skills, research, ). In other words, if a student has written a good essay, the teacher (or “tester”) can be reasonably confident that the student (or “code”) deserves a nice grade.
However, this approach has a few problems. The first is that an essay tests
everything at the same time. It’s not possible to only verify that a student
did their research without also testing their writing skills. Similarly, we
cannot test whether the
find_title_for_article() method correctly extracts
titles from web pages without making actual HTTP requests to a remote server
that we do not control.
Moreover, if a student fails to produce an essay through no fault of their own
(maybe their computer was violently attacked by a cat), they could be penalised
by the teacher! For example, our test could fail because the website from which
NosClient retrieves its titles is temporarily unreachable or has updated
the title of this article.
verify whether code generates the expected output
are relatively easy to write and maintain
execute real code, which gives you a high degree of confidence that it works well
require that all of the code that’s called directly or indirectly by the test has been implemented
may test more code than you actually want to
The mockist or London school takes a very different approach. Rather than verifying if actions on a SUT lead to a desired state, it aims to verify the behaviour of a SUT.
This is how we would test
find_title_for_article() using the London style:
As you can see, it’s a bit longer than the Chicago-style test that we saw earlier.
It is also aware of the fact that the
NosClient makes function calls to
re.search(), and uses mocks instead of their actual
This type of test is more akin to a driving test, where the examiner (or “test”) primarily bases their judgment on the behaviour of the student (or “code”) rather than what the vehicle does. What matters most is that the student demonstrates that they can shift gears smoothly, check their mirrors, be constantly aware of their surroundings, and so on.
It’s perfectly fine if the vehicle does some unconventional things, like “parking” several times in a row or temporarily stopping on a hill for absolutely no reason. It’s also not the student’s fault if the car is during the test, as long as they handled everything well. The test might not even require a real car if there’s a sufficiently realistic simulator!
This is also a downside of London-style tests: because they’re less realistic than Chicago-style tests, you might not be able to catch all bugs. Moreover, heavy reliance on mocks tends to produce tests that break often when the implementation of the SUT or mocked objects changes.
check if the SUT exhibits the right behaviour
mock dependencies so that you don’t have to test more code than strictly needed
tend to be brittle, especially when code is refactored