Chuniversiteit logomarkChuniversiteit.nl
The Toilet Paper

A large-scale empirical study on linguistic antipatterns affecting APIs

Does it matter if your method names contain tiny linguistic mistakes? Possibly.

Impossible teapot on a table
As long as it looks like a teapot, it’s not badly designed

Last week’s summary showed that it’s hard to quantify understandability of code. This week we look at a much simpler problem: the consequences of having bad method names in an API. Aghajani, Nagy, Bavota, and Lanza found that badly named methods have some impact on bugs, but it’s not clear yet why.

Why it matters

Link

Most software is developed using third-party libraries. Such libraries generally include API documentation, but this isn’t always enough: parts might be outdated or even have no documentation at all. When that happens, a developer has to rely completely on the API’s method names.

We all know that naming things is hard though, so it’s very much possible that a library’s developer chooses a method name that ends up confusing its users.

How many bugs and questions arise due to badly named methods?

How the study was conducted

Link

Another group of authors, Arnaoudova, Di Penta, and Antoniol (2016), previously compiled a handy list of linguistic antipatterns and published a tool that automatically detects the presence of such antipatterns in Java code. Twelve of these antipatterns were related to methods:

Name Meaning
“Get” – more than accessor

The method name starts with “get”, but it’s not merely an accessor, i.e. it also performs some other undocumented action

“Is” returns more than boolean

The method’s name starts with “is”, but it doesn’t return a boolean

“Set” method returns

The method’s name starts with “set”, but it has a return value

Expecting but not getting single instance

The method name implies that a single object will be returned, but the method actually returns a collection of objects

Not implemented condition

The method documentation suggests that the method has behaviour that’s not implemented

Validation method does not confirm

A validation method does not return a value that indicates whether validation was successful

“Get” method does not return

The method name starts with “get”, but the method returns void

Not answered question

The method name starts with “is”, but the method returns void

Transform method does not return

The method name implies that it transforms input, but it returns void and it’s not clear where results are stored

Expecting but not getting a collection

The method name implies that it returns a collection, but the actual return type is a single object or void

Method name and return type are opposite

The method name’s intent contradicts the return type

Method signature and comment are opposite

The method name contradicts the description in the comment

This article’s authors gathered data from 75 Java library projects on GitHub: they used the tool by Arnaoudova et al. to detect linguistic antipatterns in each of the libraries’ methods, and identified all client projects on GitHub that invoke at least one of the libraries’ affected methods using the Eclipse JDT Parser.

The Git commit history of client projects can be used to determine whether linguistic antipatterns in API methods often cause bugs. The authors look for bug fix commits that might be caused by commits in which an affected method was first invoked, and compare these with “normal” commits and bug fixes.

Finally, if antipatterns cause confusion among developers, one would expect more questions about affected methods on Stack Overflow. The authors searched for questions that explicitly mention affected methods, and compared those with questions about methods that aren’t affected by antipatterns.

What discoveries were made

Link

.

Initial quantitative analysis showed that the likelihood of introducing a bug is a whopping 29% higher if a commit introduces a call to an affected method. This result is statistically significant.

However, when the authors subsequently performed a qualitative analysis, they learned that none of the affected methods actually caused the bugs! It appears that a follow-up study is needed to figure out what’s really going on here.

The other discovery is also somewhat surprising (but a bit more straightforward, so less puzzling): affected methods do not appear to trigger significantly more questions on Stack Overflow than methods that are not affected.

Summary

Link
  1. While methods with linguistic antipatterns don’t appear to cause bugs, they strangely do appear to coincide more often with bugs

  2. Methods with linguistic antipatterns don’t really trigger more questions than methods with “better” names

References

Link
  1. Arnaoudova, V., Di Penta, M., & Antoniol, G. (2016). Linguistic antipatterns: What they are and how developers perceive them. Empirical Software Engineering, 21(1), 104–158.