Chuniversiteit.nl
“Heap, Heap, Array!”

The Chicago and London schools of TDD

Tests can be written Chicago-style or London-style. Neither is strictly better than the other; which style should you use?

Chicago’s Willis Tower and London’s Elizabeth Tower side by side
One is windy and the other winds me up

I wrote about the Arrange-Act-Assert pattern a few weeks ago, which is a standardised approach to formatting unit tests and functional tests. But there is more to writing tests than just formatting. Let’s talk about how we test our code.

Some tests can be very simple. For example, if we have a Python function that returns a random number:

Then we could write a test that verifies whether that function returns the expected number:

Of course, most real-world code is a bit more complex. Functions can be called with different arguments or may themselves call other functions.

Let’s look at a slightly lengthier example; a NosClient class that finds the title for articles on Dutch news website NOS.nl for any given article ID. It does this by fetching the page using the requests library and extracting the value of the <title> element:

This NosClient has one public method, find_title_for_article(). How should we go about testing this method?

There are two major schools of thought when it comes to how tests should be written: . Both have their strengths and weaknesses, and neither is strictly better than the other.

The classicist (Chicago) school

Link

The classicist or Chicago school is the oldest of the two, and the easiest to understand. According to the Chicago school tests should check if a system under test (SUT) produces the expected results, which can be done by simply letting it do its thing and checking its output.

In the case of NosClient.find_title_for_article(), we could call the method with an article id and assert that it returns the title of that article:

Chicago-style tests are kind of like a teacher grading an essay. They only get to see the end result and that’s all they care about. , as long as it meets the requirements.

This is because good essays can only be written by those who have done all the necessary work (developing writing skills, research, ). In other words, if a student has written a good essay, the teacher (or “tester”) can be reasonably confident that the student (or “code”) deserves a nice grade.

However, this approach has a few problems. The first is that an essay tests everything at the same time. It’s not possible to only verify that a student did their research without also testing their writing skills. Similarly, we cannot test whether the find_title_for_article() method correctly extracts titles from web pages without making actual HTTP requests to a remote server that we do not control.

Moreover, if a student fails to produce an essay through no fault of their own (maybe their computer was violently attacked by a cat), they could be penalised by the teacher! For example, our test could fail because the website from which the NosClient retrieves its titles is temporarily unreachable or has updated the title of this article.

Chicago-style tests…

Link
  1. verify whether code generates the expected output

  2. are relatively easy to write and maintain

  3. execute real code, which gives you a high degree of confidence that it works well

  4. require that all of the code that’s called directly or indirectly by the test has been implemented

  5. may test more code than you actually want to

The mockist (London) school

Link

The mockist or London school takes a very different approach. Rather than verifying if actions on a SUT lead to a desired state, it aims to verify the behaviour of a SUT.

This is how we would test find_title_for_article() using the London style:

As you can see, it’s a bit longer than the Chicago-style test that we saw earlier. It is also aware of the fact that the NosClient makes function calls to requests.get() and re.search(), and uses mocks instead of their actual implementations.

This type of test is more akin to a driving test, where the examiner (or “test”) primarily bases their judgment on the behaviour of the student (or “code”) rather than what the vehicle does. What matters most is that the student demonstrates that they can shift gears smoothly, check their mirrors, be constantly aware of their surroundings, and so on.

It’s perfectly fine if the vehicle does some unconventional things, like “parking” several times in a row or temporarily stopping on a hill for absolutely no reason. It’s also not the student’s fault if the car is during the test, as long as they handled everything well. The test might not even require a real car if there’s a sufficiently realistic simulator!

This is also a downside of London-style tests: because they’re less realistic than Chicago-style tests, you might not be able to catch all bugs. Moreover, heavy reliance on mocks tends to produce tests that break often when the implementation of the SUT or mocked objects changes.

London-style tests…

Link
  1. check if the SUT exhibits the right behaviour

  2. mock dependencies so that you don’t have to test more code than strictly needed

  3. tend to be brittle, especially when code is refactored