  The Toilet Paper

# Does code structure affect comprehension? On using and naming intermediate variables

The answer is not “yes” or “no”, but somewhere in between. Who would’ve guessed? There are two easy solutions in computer science: caching and naming things.

Conventional wisdom says that lengthy chunks of code are hard to read, and that you should therefore split them into smaller pieces. Does this really make your code easier to read?

Algorithms and functionality can often be expressed in different ways. For example, the distance between two points can be calculated as follows using a single expression:

`d = sqrt( (A.x-B.x)**2 + (A.y-B.y)**2 )`

or using three separate expressions:

`dx = A.x - B.xdy = A.y - B.yd = sqrt( dx**2 + dy**2 )`

Each of the lines in the second version is easier to understand than the compound expression in the first version. However, the reader now also has to mentally “connect” the lines if they want to understand what is going on.

This is a very simple example of course, but similar issues exist when decomposing large functions into several smaller functions or in .

## How the study was conducted When you split a single compound expression into multiple smaller expressions, you inevitably also create intermediate variables, which you have to name. Good variable names serve as a form of inline documentation and thus may also make your code easier to understand. So decomposition actually affects understandability in two different ways.

The researchers studied these two ways using a controlled experiment. Participants were given 6 Python functions that implemented relatively well-known, non-trivial mathematical functions in one of three ways:

• As a single compound expression without any intermediate variables
`def foo(arr):    return sum((x - (sum(arr) / len(arr)))**2 for x in arr) / len(arr)`
• Decomposed into separate expressions, where intermediate variables are given meaningless names, like `tmp1` and `tmp2`
`def foo(arr):    tmp1 = len(arr)    tmp2 = sum(arr) / tmp1    return sum((x - tmp2)**2 for x in arr) / tmp1`
• Again, decomposed into separate expressions, but now intermediate variables are given meaningful names
`def foo(arr):    n = len(arr)    mean = sum(arr) / n    return sum((x - mean)**2 for x in arr) / n`

As you can see, all functions are named `foo()`. That is because participants were asked to read the code and come up with a name. If the name accurately describes the algorithm, one can assume that they understand what the code does. The researchers manually verified the correctness of the answers. 