Chuniversiteit.nl
The Toilet Paper

A large-scale empirical study on linguistic antipatterns affecting APIs

Does it matter if your method names contain tiny linguistic mistakes? Possibly.

Impossible teapot on a table

Last week’s summary showed that it’s hard to quantify understandability of code. This week we look at a much simpler problem: the consequences of having bad method names in an API. Aghajani, Nagy, Bavota, and Lanza found that badly named methods have some impact on bugs, but it’s not clear yet why.

Why it matters

Most software is developed using third-party libraries. Such libraries generally include API documentation, but this isn’t always enough: parts might be outdated or even have no documentation at all. When that happens, a developer has to rely completely on the API’s method names.

We all know that naming things is hard though, so it’s very much possible that a library’s developer chooses a method name that ends up confusing its users.

How many bugs and questions arise due to badly named methods?

How the study was conducted

Another group of authors, Arnaoudova, Di Penta, and Antoniol (2016), previously compiled a handy list of linguistic antipatterns and published a tool that automatically detects the presence of such antipatterns in Java code. Twelve of these antipatterns were related to methods:

NameMeaning
“Get” – more than accessorThe method name starts with “get”, but it’s not merely an accessor, i.e. it also performs some other undocumented action
“Is” returns more than booleanThe method’s name starts with “is”, but it doesn’t return a boolean
“Set” method returnsThe method’s name starts with “set”, but it has a return value
Expecting but not getting single instanceThe method name implies that a single object will be returned, but the method actually returns a collection of objects
Not implemented conditionThe method documentation suggests that the method has behaviour that’s not implemented
Validation method does not confirmA validation method does not return a value that indicates whether validation was successful
“Get” method does not returnThe method name starts with “get”, but the method returns void
Not answered questionThe method name starts with “is”, but the method returns void
Transform method does not returnThe method name implies that it transforms input, but it returnsvoid and it’s not clear where results are stored
Expecting but not getting a collectionThe method name implies that it returns a collection, but the actual return type is a single object or void
Method name and return type are oppositeThe method name’s intent contradicts the return type
Method signature and comment are oppositeThe method name contradicts the description in the comment

This article’s authors gathered data from 75 Java library projects on GitHub: they used the tool by Arnaoudova et al. to detect linguistic antipatterns in each of the libraries’ methods, and identified all client projects on GitHub that invoke at least one of the libraries’ affected methods using the Eclipse JDT Parser.

The Git commit history of client projects can be used to determine whether linguistic antipatterns in API methods often cause bugs. The authors look for bug fix commits that might be caused by commits in which an affected method was first invoked, and compare these with “normal” commits and bug fixes.

Finally, if antipatterns cause confusion among developers, one would expect more questions about affected methods on Stack Overflow. The authors searched for questions that explicitly mention affected methods, and compared those with questions about methods that aren’t affected by antipatterns.

What discoveries were made

.

Initial quantitative analysis showed that the likelihood of introducing a bug is a whopping 29% higher if a commit introduces a call to an affected method. This result is statistically significant.

However, when the authors subsequently performed a qualitative analysis, they learned that none of the affected methods actually caused the bugs! It appears that a follow-up study is needed to figure out what’s really going on here.

The other discovery is also somewhat surprising (but a bit more straightforward, so less puzzling): affected methods do not appear to trigger significantly more questions on Stack Overflow than methods that are not affected.

Summary

  1. While methods with linguistic antipatterns don’t appear to cause bugs, they strangely do appear to coincide more often with bugs

  2. Methods with linguistic antipatterns don’t really trigger more questions than methods with “better” names

References

  1. Arnaoudova, V., Di Penta, M., & Antoniol, G. (2016). Linguistic antipatterns: What they are and how developers perceive them. Empirical Software Engineering, 21(1), 104–158.