A large-scale empirical study on linguistic antipatterns affecting APIs

Published: 21 Oct 2018
Written by: Chun Fei Lung

Does it matter if your method names contain tiny linguistic mistakes? Possibly.

As long as it looks like a teapot, it’s not badly designed

Last week’s summary showed that it’s hard to quantify understandability of code. This week we look at a much simpler problem: the consequences of having bad method names in an API. Aghajani, Nagy, Bavota, and Lanza found that badly named methods have some impact on bugs, but it’s not clear yet why.

About the article

Title	A large-scale empirical study on linguistic antipatterns affecting APIs
Year	2018
Author(s)	Emad Aghajani (Università della Svizzera italiana) Csaba Nagy (Università della Svizzera italiana) Gabriele Bavota (Università della Svizzera italiana) Michele Lanza (Università della Svizzera italiana)
Venue	Proceedings of the 34th International Conference on Software Maintenance and Evolution

Why it matters

Most software is developed using third-party libraries. Such libraries generally include API documentation, but this isn’t always enough: parts might be outdated or even have no documentation at all. When that happens, a developer has to rely completely on the API’s method names.

We all know that naming things is hard though, so it’s very much possible that a library’s developer chooses a method name that ends up confusing its users.

How many bugs and questions arise due to badly named methods?

How the study was conducted

Another group of authors, Arnaoudova, Di Penta, and Antoniol (2016), previously compiled a handy list of linguistic antipatterns and published a tool that automatically detects the presence of such antipatterns in Java code. Twelve of these antipatterns were related to methods:

Name	Meaning
“Get” – more than accessor	The method name starts with “get”, but it’s not merely an accessor, i.e. it also performs some other undocumented action
“Is” returns more than boolean	The method’s name starts with “is”, but it doesn’t return a boolean
“Set” method returns	The method’s name starts with “set”, but it has a return value
Expecting but not getting single instance	The method name implies that a single object will be returned, but the method actually returns a collection of objects
Not implemented condition	The method documentation suggests that the method has behaviour that’s not implemented
Validation method does not confirm	A validation method does not return a value that indicates whether validation was successful
“Get” method does not return	The method name starts with “get”, but the method returns `void`
Not answered question	The method name starts with “is”, but the method returns `void`
Transform method does not return	The method name implies that it transforms input, but it returns `void` and it’s not clear where results are stored
Expecting but not getting a collection	The method name implies that it returns a collection, but the actual return type is a single object or `void`
Method name and return type are opposite	The method name’s intent contradicts the return type
Method signature and comment are opposite	The method name contradicts the description in the comment

This article’s authors gathered data from 75 Java library projects on GitHub: they used the tool by Arnaoudova et al. to detect linguistic antipatterns in each of the libraries’ methods, and identified all client projects on GitHub that invoke at least one of the libraries’ affected methods using the Eclipse JDT Parser.

The Git commit history of client projects can be used to determine whether linguistic antipatterns in API methods often cause bugs. The authors look for bug fix commits that might be caused by commits in which an affected method was first invoked, and compare these with “normal” commits and bug fixes.

Finally, if antipatterns cause confusion among developers, one would expect more questions about affected methods on Stack Overflow. The authors searched an official 2017 Stack Overflow dump (side note: It’s freely available, so you can get your own copy from the Internet Archive) for questions that explicitly mention affected methods, and compared those with questions about methods that aren’t affected by antipatterns.

What discoveries were made

Only about 2% of all library methods are affected by linguistic antipatterns, which fortunately isn’t very much (side note: You can find a lot more of these numbers in the original article).

Initial quantitative analysis showed that the likelihood of introducing a bug is a whopping 29% higher if a commit introduces a call to an affected method. This result is statistically significant.

However, when the authors subsequently performed a qualitative analysis, they learned that none of the affected methods actually caused the bugs! It appears that a follow-up study is needed to figure out what’s really going on here.

The other discovery is also somewhat surprising (but a bit more straightforward, so less puzzling): affected methods do not appear to trigger significantly more questions on Stack Overflow than methods that are not affected.

Summary

While methods with linguistic antipatterns don’t appear to cause bugs, they strangely do appear to coincide more often with bugs
Methods with linguistic antipatterns don’t really trigger more questions than methods with “better” names