How do design decisions affect the distribution of software metrics? (2018)
Many organisations use static analysis tools to calculate metrics, which they then use to grade the quality of their software projects. This might not necessarily be a good approach however, as this study by Dósea, Sant’Anna, and da Silva shows that metrics don’t mean that much without proper context.
Why it matters
Most developers will agree that very long methodsOr functions, but I’ll keep calling them methods, because we’re dealing with Java are hard to maintain and should be refactored.
When is a function or method too long? Many static analysis toolsWell-known examples include SonarQube and Scrutinizer often use a single pre-defined threshold. This is far from ideal, because the design role of a class may affect what’s considered to be acceptable. For instance, a threshold like 20 lines of code may be too low for a View, but too high for a Controller.
You can of course set different thresholds for different types of classes like models, views, and controllers using the approach used by Aniche et al. (2018). But that might not solve everything.
Firstly, there are many classes that don’t fit into some well-known reference architecture like MVC. One might use some default threshold value as a fallback, but then you’re back to square one.
Secondly, two classes that are implemented using the same language and share the same design role might still look very different if the used libraries are not the same.
Finally, some coding styles may require more lines than others. Compare the two snippets below: both query a database and retrieve a list of
Airport objects, but the first is considerably shorter:
How large are these differences in practice and can (and should) we take these into account when interpreting metrics about our code’s quality?
How the study was conducted
The authors designed and implemented a tool, DesignRoleMiner, that can infer the design role of a class, based on a number of characteristics:
- The presence of certain keywords in a class name, like
- Class annotations that contain a keyword, like
- Inheritance from a base class that contains a keyword, like as
- Implementation of interfaces – some of these are “known” roles (e.g.
Repository), whilst others are mapped to new roles;
- Classes that mostly just consist of attributes, getters, and setters – these are likely to be entities.
For the actual study, fifteen representative Java projectsi.e. popular, actively maintained, from different domains (Web, Android, and desktop), and no frameworks and libraries were selected from GitHub.
The authors then used the tool to assign a design role to each of the projects’ classes and computed some metrics for each of their methods:
- McCabe’s cyclomatic complexity;
- The number of method parameters;
- Lines of executable statements;
- Efferent couplingThe number of classes that are accessed from within the method.
What discoveries were made
Within the same project one can find significant differences in the distribution of metric values of (at least two) different design roles.
For example, design roles like Entity and Exception typically don’t do much other than encapsulating data and therefore generally have few lines of code. Methods in other roles – like AsyncTask in one of the studied projects – may be more complex and thus require more lines of code and input parameters.
The authors then analysed design roles that appeared in at least two projects. In many cases a significant difference can be found between classes with the same design role from different projects.
A manual analysis of the differences showed which design decisions contributed to these changes:
Some libraries are harder to use than others. This is not necessarily due to deficiencies in the library. For instance, some applications may simply require more advanced features.
Some libraries accomodate multiple coding styles. Hibernate for example allows users to express queries using a Criteria mechanismThis is what you saw in the code examples above or HQL, a SQL-like query language. The latter is much more succinct and does not require as much efferent coupling, although it’s also a lot less type-safe.
There is no general consensus on where to place code for things like exception handling, logging, and debugging: they can appear in different design roles, which obviously affects the distribution of metrics.
Therefore one should also take other design decisions into account when building benchmarks for code quality analyses.
Finally, the authors compared metric distribution between different releases of the same project. While differences do exist, they do not occur very frequently. This suggests that it may be a good idea to include data from previous releases when building benchmarks for analyses.