The Toilet Paper

# Does code structure affect comprehension? On using and naming intermediate variables

The answer is not “yes” or “no”, but somewhere in between. Who would’ve guessed?

Conventional wisdom says that lengthy chunks of code are hard to read, and that you should therefore split them into smaller pieces. Does this really make your code easier to read?

## Why it matters

Algorithms and functionality can often be expressed in different ways. For example, the distance between two points can be calculated as follows using a single expression:

`d = sqrt( (A.x-B.x)**2 + (A.y-B.y)**2 )`

or using three separate expressions:

`dx = A.x - B.xdy = A.y - B.yd = sqrt( dx**2 + dy**2 )`

Each of the lines in the second version is easier to understand than the compound expression in the first version. However, the reader now also has to mentally “connect” the lines if they want to understand what is going on.

This is a very simple example of course, but similar issues exist when decomposing large functions into several smaller functions or in .

## How the study was conducted

When you split a single compound expression into multiple smaller expressions, you inevitably also create intermediate variables, which you have to name. Good variable names serve as a form of inline documentation and thus may also make your code easier to understand. So decomposition actually affects understandability in two different ways.

The researchers studied these two ways using a controlled experiment. Participants were given 6 Python functions that implemented relatively well-known, non-trivial mathematical functions in one of three ways:

• As a single compound expression without any intermediate variables
`def foo(arr):    return sum((x - (sum(arr) / len(arr)))**2 for x in arr) / len(arr)`
• Decomposed into separate expressions, where intermediate variables are given meaningless names, like `tmp1` and `tmp2`
`def foo(arr):    tmp1 = len(arr)    tmp2 = sum(arr) / tmp1    return sum((x - tmp2)**2 for x in arr) / tmp1`
• Again, decomposed into separate expressions, but now intermediate variables are given meaningful names
`def foo(arr):    n = len(arr)    mean = sum(arr) / n    return sum((x - mean)**2 for x in arr) / n`

As you can see, all functions are named `foo()`. That is because participants were asked to read the code and come up with a name. If the name accurately describes the algorithm, one can assume that they understand what the code does. The researchers manually verified the correctness of the answers.

## What discoveries were made

The first and third version above are extremes, while the second version is somewhere in between. One would therefore expect that the ratio of correct answers for the second version is between those for versions 1 and 3.

They’re not. For some functions version 2 was the hardest to identify correctly, for one it was very similar to version 3, while for others the results were similar to those for version 1. In other words, there is no such thing as a “best” way to write such code – it all depends on the function.

The results also clearly show that version 3 was better understood for 5 of the 6 Python functions. This strongly suggests that intermediate variables are not necessarily “better” on their own, but can be very useful when they are given meaningful names.

## Summary

1. Intermediate variables do not make code easier to understand on their own, but can be helpful if they have meaningful names