Chuniversiteit.nl
The Toilet Paper

Is there a time and place for single-letter variable names?

We are often told that variable names should be meaningful and avoid abbreviations, yet some variables are simply named “k”. What gives?

A heart, sandwiched between two Dutch chocolate letters I and U, forming the phrase “I love u”
It’s okay to be single if you’re a variable

Giving variables meaningful names is a widely accepted best practice that is recommended in pretty much every style guide or book on programming that you can find. Well-named variables make code easier to understand and can sometimes even serve as a form of documentation, making comments unnecessary.

At the same time, almost everyone also names their loop variable i rather than something more meaningful, like indexOfLoopOverAllRecords. Clearly, there are some situations in which programmers believe that this best practice can be “violated”.

Is that belief justified? Let’s look at what the science says!

The study we’re looking at today consists of three parts, which look at the acceptableness of single-letter variable names from different viewpoints.

Single-letter names in practice

Different programming languages have different coding conventions and idioms. This also translates into differences in variable naming.

The authors of the paper mined the 200 most popular GitHub repositories for five different programming languages (C, Java, JavaScript, PHP, and Perl) and compared the characteristics of their variable names.

They found that short variable names are common in many languages. Single-letter variable names are more or less equally common as other short lengths, .

As expected, i is the most commonly used name for single-letter variables in most languages. j is also fairly common for the same reason. The frequency of other letters appears to be language-dependent. For instance, v(alue) is the most common letter in Perl, while p(ointer), c(har) and n (counter?) are common in C.

Most single-letter variables use lowercase letters. Uppercase letters are primarily used in Perl, where they even outnumber all lowercase letters, except for i, j and v.

Effect on understandability of code

Just because single-letter variable names are common in practice does not mean that they are a good idea. The second part therefore consists of three experimental procedures, of which two are controlled experiments and one is an opinion survey.

“Can we fix it?”

The first experiment attempts to measure the negative effect of single-letter variable names on the maintainability of code. It involves a coding task, in which experimental subjects are asked to fix a defective piece of code, whose functionality is explained beforehand. Some subjects are given a version of the code with meaningful names for everything, while others work with one of two versions in which some variables have been given a meaningless single-letter name.

The results suggest that there is little difference in the correctness and time spent on solutions between the three versions. However, due to the small sample size these results should be interpreted as failing to show a difference, and not as finding that there is no difference. Moreover, differences seem to be largely attributable to other variables, like the age and sex of a subject.

“Why is it so hard”

The second experiment is even more focussed on the possible adverse effects of single-letter names. It uses a much more “computer sciencey” algorithmic coding problem that is harder to understand. Experimental subjects were given either a version with meaningful names or a version with single-letter names, and asked to 1) explain what the code does and 2) asked to extend it.

Sadly it turned out that this coding problem was too hard to understand for almost everyone. Few people made it through part 1, regardless of what convention was used for variable names. Again, individual differences between subjects had a more significant impact on results than variable names: if you do not have the required background and skills, good variable names will not save you.

“What do you want?”

In the opinion survey, subjects were shown four which function do you prefer questions. Each question included 2 or 3 versions of a function, with varying degrees of single-letter variable naming.

The results show that an overwhelming majority of respondents prefers functions with long, meaningful names, even though the experiments suggest that single-letter variable names do not have a significant effect on the understandability of code.

Expectations for single-letter names

The letter i is commonly used for loop indices and therefore has a clear meaning to programmers. But what about other letters? A survey was used to ask respondents about their associations with all 26 letters of the English alphabet.

As expected, many letters are clearly associated with types that start with that letter, e.g. s for string, c for char, and o for object. This is not always the case: j, k, and n are strongly associated with integers, while d, e, f, r and t are associated with floating-point numbers. x, y and z are associated with both types of numbers.

The number of distinct associated meanings differs between letters. Some letters like s, t (time), i, j and k (loop indices) have a single, clear meaning. Programmers are less sure what to expect when they see letters like a, h and m, for which they have many different associations or no associations at all.

Summary

  1. Single-letter (and other short) variable names are common in real-life software projects

  2. Programmers prefer longer variable names over single-letter names, even though they have no effect on program comprehension

  3. Certain letters of the alphabet create expectations about their type and usage, which suggests that letters should be chosen carefully