Is there a time and place for single-letter variable names?
Giving variables meaningful names is a widely accepted best practice that is recommended in pretty much every style guide or book on programming that you can find. Well-named variables make code easier to understand and can sometimes even serve as a form of documentation, making comments unnecessary.
At the same time, almost everyone also names their loop variable i
rather than
something more meaningful, like indexOfLoopOverAllRecords
. Clearly, there are
some situations in which programmers believe that this best practice can be
“violated”.
Is that belief justified? Let’s look at what the science says!
The study we’re looking at today consists of three parts, which look at the acceptableness of single-letter variable names from different viewpoints.
Different programming languages have different coding conventions and idioms. This also translates into differences in variable naming.
The authors of the paper mined the 200 most popular GitHub repositories for five different programming languages (C, Java, JavaScript, PHP, and Perl) and compared the characteristics of their variable names.
They found that short variable names are common in many languages. Single-letter variable names are more or less equally common as other short lengths, .
As expected, i
is the most commonly used name for single-letter variables in
most languages. j
is also fairly common for the same reason. The frequency of
other letters appears to be language-dependent. For instance, v
(alue) is
the most common letter in Perl, while p
(ointer), c
(har) and n
(counter?) are common in C.
Most single-letter variables use lowercase letters. Uppercase letters are primarily
used in Perl, where they even outnumber all lowercase letters, except for i
, j
and v
.
Just because single-letter variable names are common in practice does not mean that they are a good idea. The second part therefore consists of three experimental procedures, of which two are controlled experiments and one is an opinion survey.
The first experiment attempts to measure the negative effect of single-letter variable names on the maintainability of code. It involves a coding task, in which experimental subjects are asked to fix a defective piece of code, whose functionality is explained beforehand. Some subjects are given a version of the code with meaningful names for everything, while others work with one of two versions in which some variables have been given a meaningless single-letter name.
The results suggest that there is little difference in the correctness and time
spent on solutions between the three versions. However, due to the small sample
size these results should be interpreted as failing to show a difference, and
not as finding that there is no difference
. Moreover, differences seem to be
largely attributable to other variables, like the age and sex of a subject.
The second experiment is even more focussed on the possible adverse effects of single-letter names. It uses a much more “computer sciencey” algorithmic coding problem that is harder to understand. Experimental subjects were given either a version with meaningful names or a version with single-letter names, and asked to 1) explain what the code does and 2) asked to extend it.
Sadly it turned out that this coding problem was too hard to understand for
almost everyone. Few people made it through part 1, regardless of what convention
was used for variable names. Again, individual differences between subjects had
a more significant impact on results than variable names: if you do not have
the required background and skills, good variable names will not save you.
In the opinion survey, subjects were shown four which function do you prefer
questions. Each question included 2 or 3 versions of a function, with varying
degrees of single-letter variable naming.
The results show that an overwhelming majority of respondents prefers functions with long, meaningful names, even though the experiments suggest that single-letter variable names do not have a significant effect on the understandability of code.
The letter i
is commonly used for loop indices and therefore has a clear meaning
to programmers. But what about other letters? A survey was used to ask respondents
about their associations with all 26 letters of the English alphabet.
As expected, many letters are clearly associated with types that start with that
letter, e.g. s
for string, c
for char, and o
for object. This is not always
the case: j
, k
, and n
are strongly associated with integers, while d
, e
,
f
, r
and t
are associated with floating-point numbers. x
, y
and z
are
associated with both types of numbers.
The number of distinct associated meanings differs between letters. Some letters
like s
, t
(time), i
, j
and k
(loop indices) have a single, clear meaning.
Programmers are less sure what to expect when they see letters like a
, h
and
m
, for which they have many different associations or no associations at all.
-
Single-letter (and other short) variable names are common in real-life software projects
-
Programmers prefer longer variable names over single-letter names, even though they have no effect on program comprehension
-
Certain letters of the alphabet create expectations about their type and usage, which suggests that letters should be chosen carefully