The Chicago and London schools of TDD

Published: 20 Jan 2023
Written by: Chun Fei Lung

Tests can be written Chicago-style or London-style. Neither is strictly better than the other; which style should you use?

One is windy and the other winds me up

I wrote about the Arrange-Act-Assert pattern a few weeks ago, which is a standardised approach to formatting unit tests and functional tests. But there is more to writing tests than just formatting. Let’s talk about how we test our code.

Some tests can be very simple. For example, if we have a Python function that returns a random number:

def get_random_number() -> int:
    return 4 # chosen by fair dice roll.
             # guaranteed to be random.

Then we could write a test that verifies whether that function returns the expected number:

def test_get_random_number() -> None:
    assert get_random_number() == 4

Of course, most real-world code is a bit more complex. Functions can be called with different arguments or may themselves call other functions.

Let’s look at a slightly lengthier example; a NosClient class that finds the title for articles on Dutch news website NOS.nl for any given article ID. It does this by fetching the page using the requests library and extracting the value of the <title> element:

import re
import requests


class NosClient:
    __hostname: str

    def __init__(self, hostname: str = "https://nos.nl"):
        self.__hostname = hostname

    def find_title_for_article(self, id: int) -> str or None:
        r = requests.get("{}/l/{}".format(self.__hostname, id))
        return self.__extract_title_from_html(r.text)

    @staticmethod
    def __extract_title_from_html(html: str) -> str or None:
        result = re.search(r"<title>(.+)</title>", html)

        return result.group(1) if result else None

This NosClient has one public method, find_title_for_article(). How should we go about testing this method?

There are two major schools of thought when it comes to how tests should be written: the Chicago school and the London school (side note: Reportedly named after places where they were first made popular.). Both have their strengths and weaknesses, and neither is strictly better than the other.

The classicist (Chicago) school

The classicist or Chicago school is the oldest of the two, and the easiest to understand. According to the Chicago school tests should check if a system under test (SUT) produces the expected results, which can be done by simply letting it do its thing and checking its output.

In the case of NosClient.find_title_for_article(), we could call the method with an article id and assert that it returns the title of that article:

from news_api import NosClient


def test_gets_title_for_article():
    # > Arrange
    client = NosClient()
    # > Act
    result = client.find_title_for_article(id=541147)
    # > Assert
    assert result == "Area 51 bestaat!!1!!11!"

Chicago-style tests are kind of like a teacher grading an essay. They only get to see the end result and that’s all they care about. It doesn’t really matter whether the student pulled an all-nighter or wrote their essay over the course of several days (side note: Although in practice the latter is more likely to be high-quality.), as long as it meets the requirements.

This is because good essays can only be written by those who have done all the necessary work (developing writing skills, research, etc. (side note: I’m just going to pretend that essay mills and ChatGPT don’t exist.)). In other words, if a student has written a good essay, the teacher (or “tester”) can be reasonably confident that the student (or “code”) deserves a nice grade.

However, this approach has a few problems. The first is that an essay tests everything at the same time. It’s not possible to only verify that a student did their research without also testing their writing skills. Similarly, we cannot test whether the find_title_for_article() method correctly extracts titles from web pages without making actual HTTP requests to a remote server that we do not control.

Moreover, if a student fails to produce an essay through no fault of their own (maybe their computer was violently attacked by a cat), they could be penalised by the teacher! For example, our test could fail because the website from which the NosClient retrieves its titles is temporarily unreachable or has updated the title of this article.

Chicago-style tests…

verify whether code generates the expected output
are relatively easy to write and maintain
execute real code, which gives you a high degree of confidence that it works well
require that all of the code that’s called directly or indirectly by the test has been implemented
may test more code than you actually want to

The mockist (London) school

The mockist or London school takes a very different approach. Rather than verifying if actions on a SUT lead to a desired state, it aims to verify the behaviour of a SUT.

This is how we would test find_title_for_article() using the London style:

from news_api import NosClient
import re
import requests


def test_gets_title_for_article(mocker):
    # > Arrange
    mock = mocker.Mock()
    mock.text = "<html><title>Test</title></html>"
    mocker.patch("requests.get", return_value=mock)
    mocker.patch("re.search")
    client = NosClient()
    # > Act
    client.find_title_for_article(id=541147)
    # > Assert
    requests.get.assert_called_once_with("https://nos.nl/l/541147")
    re.search.assert_called_once_with(mocker.ANY, mock.text)

As you can see, it’s a bit longer than the Chicago-style test that we saw earlier. It is also aware of the fact that the NosClient makes function calls to requests.get() and re.search(), and uses mocks instead of their actual implementations.

This type of test is more akin to a driving test, where the examiner (or “test”) primarily bases their judgment on the behaviour of the student (or “code”) rather than what the vehicle does. What matters most is that the student demonstrates that they can shift gears smoothly, check their mirrors, be constantly aware of their surroundings, and so on.

It’s perfectly fine if the vehicle does some unconventional things, like “parking” several times in a row or temporarily stopping on a hill for absolutely no reason. It’s also not the student’s fault if the car is rear-ended by a drunk driver (side note: Or you know, if a remote server doesn’t respond to your HTTP request…) during the test, as long as they handled everything well. The test might not even require a real car if there’s a sufficiently realistic simulator!

This is also a downside of London-style tests: because they’re less realistic than Chicago-style tests, you might not be able to catch all bugs. Moreover, heavy reliance on mocks tends to produce tests that break often when the implementation of the SUT or mocked objects changes.