A large-scale empirical study on linguistic antipatterns affecting APIs (2018)
Last week’s summary showed that it’s hard to quantify understandability of code. This week we look at a much simpler problem: the consequences of having bad method names in an API. Aghajani, Nagy, Bavota, and Lanza found that badly named methods have some impact on bugs, but it’s not clear yet why.
Why it matters
Most software is developed using third-party libraries. Such libraries generally include API documentation, but this isn’t always enough: parts might be outdated or even have no documentation at all. When that happens, a developer has to rely completely on the API’s method names.
We all know that naming things is hard though, so it’s very much possible that a library’s developer chooses a method name that ends up confusing its users.
How many bugs and questions arise due to badly named methods?
How the study was conducted
Another group of authors, Arnaoudova, Di Penta, and Antoniol (2016), previously compiled a handy list of linguistic antipatterns and published a tool that automatically detects the presence of such antipatterns in Java code. Twelve of these antipatterns were related to methods:
|“Get” – more than accessor||The method name starts with “get”, but it’s not merely an accessor, i.e. it also performs some other undocumented action|
|“Is” returns more than boolean||The method’s name starts with “is”, but it doesn’t return a boolean|
|“Set” method returns||The method’s name starts with “set”, but it has a return value|
|Expecting but not getting single instance||The method name implies that a single object will be returned, but the method actually returns a collection of objects|
|Not implemented condition||The method documentation suggests that the method has behaviour that’s not implemented|
|Validation method does not confirm||A validation method does not return a value that indicates whether validation was successful|
|“Get” method does not return|| The method name starts with “get”, but the method returns |
|Not answered question|| The method name starts with “is”, but the method returns |
|Transform method does not return|| The method name implies that it transforms input, but it returns |
|Expecting but not getting a collection|| The method name implies that it returns a collection, but the actual return type is a single object or |
|Method name and return type are opposite||The method name’s intent contradicts the return type|
|Method signature and comment are opposite||The method name contradicts the description in the comment|
This article’s authors gathered data from 75 Java library projects on GitHub: they used the tool by Arnaoudova et al. to detect linguistic antipatterns in each of the libraries’ methods, and identified all client projects on GitHub that invoke at least one of the libraries’ affected methods using the Eclipse JDT Parser.
The Git commit history of client projects can be used to determine whether linguistic antipatterns in API methods often cause bugs. The authors look for bug fix commits that might be caused by commits in which an affected method was first invoked, and compare these with “normal” commits and bug fixes.
Finally, if antipatterns cause confusion among developers, one would expect more questions about affected methods on Stack Overflow. The authors searched an official 2017 Stack Overflow dumpIt’s freely available, so you can get your own copy from the Internet Archive for questions that explicitly mention affected methods, and compared those with questions about methods that aren’t affected by antipatterns.
What discoveries were made
Only about 2% of all library methods are affected by linguistic antipatterns, which fortunately isn’t very muchYou can find a lot more of these numbers in the original article.
Initial quantitative analysis showed that the likelihood of introducing a bug is a whopping 29% higher if a commit introduces a call to an affected method. This result is statistically significant.
However, when the authors subsequently performed a qualitative analysis, they learned that none of the affected methods actually caused the bugs! It appears that a follow-up study is needed to figure out what’s really going on here.
The other discovery is also somewhat surprising (but a bit more straightforward, so less puzzling): affected methods do not appear to trigger significantly more questions on Stack Overflow than methods that are not affected.
The important bits
- Arnaoudova, V., Di Penta, M., & Antoniol, G. (2016). Linguistic antipatterns: What they are and how developers perceive them. Empirical Software Engineering, 21(1), 104–158.