How do design decisions affect the distribution of software metrics?

Published: 31 Mar 2019
Written by: Chun Fei Lung

The authors of this week’s paper built a tool that can infer the design role of classes and let it analyse some Java projects.

Let’s just hope your project’s code is in a good shape

Many organisations use static analysis tools to calculate metrics, which they then use to grade the quality of their software projects. This might not necessarily be a good approach however, as this study by Dósea, Sant’Anna, and da Silva shows that metrics don’t mean that much without proper context.

About the article

Title	How do design decisions affect the distribution of software metrics
Year	2018
Author(s)	Marcos Dósea (Federal University of Sergipe and Federal University of Bahia) Cláudio Sant’Anna (Federal University of Bahia) Bruno C. da Silva (California Polytechnic State University)
Venue	Proceedings of the 26th Conference on Program Comprehension

Why it matters

Most developers will agree that very long methods (side note: Or functions, but I’ll keep calling them methods, because we’re dealing with Java) are hard to maintain and should be refactored.

When is a function or method too long? Many static analysis tools (side note: Well-known examples include SonarQube and Scrutinizer) often use a single pre-defined threshold. This is far from ideal, because the design role of a class may affect what’s considered to be acceptable. For instance, a threshold like 20 lines of code may be too low for a View, but too high for a Controller.

You can of course set different thresholds for different types of classes like models, views, and controllers using the approach used by Aniche et al. (2018). But that might not solve everything.

Firstly, there are many classes that don’t fit into some well-known reference architecture like Model-View-Controller (MVC). One might use some default threshold value as a fallback, but then you’re back to square one.

Secondly, two classes that are implemented using the same language and share the same design role might still look very different if the used libraries are not the same.

Finally, some coding styles may require more lines than others. Compare the two snippets below: both query a database and retrieve a list of Airport objects, but the first is considerably shorter:

public List<Airport> listByCountry(Country country) {
    return getSession().createCriteria(Airport.class).add(Restrictions.eq("airport.country", country)).list();
}

public List<Airport> listByCountry(Country country) {
    final Criteria criteria = this.getSession()
                                  .createCriteria(Airport.class);
    
    criteria.createAlias("country", "c");
    criteria.add(Restrictions.eq("c.id", country.getId()));

    return criteria.list();
}

How large are these differences in practice and can (and should) we take these into account when interpreting metrics about our code’s quality?

How the study was conducted

The authors designed and implemented a tool, DesignRoleMiner, that can infer the design role of a class, based on a number of characteristics:

The presence of certain keywords in a class name, like DTO;
Class annotations that contain a keyword, like @Service;
Inheritance from a base class that contains a keyword, such as AbstractController;
Implementation of interfaces – some of these are “known” roles (e.g. Repository), whilst others are mapped to new roles;
Classes that mostly just consist of attributes, getters, and setters – these are likely to be entities.

For the actual study, fifteen representative Java projects (side note: i.e. popular, actively maintained, from different domains (Web, Android, and desktop), and no frameworks and libraries) were selected from GitHub.

The authors then used the tool to assign a design role to each of the projects’ classes and computed some metrics for each of their methods:

McCabe’s cyclomatic complexity;
The number of method parameters;
Lines of executable statements;
Efferent coupling (side note: The number of classes that are accessed from within the method).

What discoveries were made

Within the same project one can find significant differences in the distribution of metric values of (at least two) different design roles.

For example, design roles like Entity and Exception typically don’t do much other than encapsulating data and therefore generally have few lines of code. Methods in other roles – like AsyncTask in one of the studied projects – may be more complex and thus require more lines of code and input parameters.

The authors then analysed design roles that appeared in at least two projects. In many cases a significant difference can be found between classes with the same design role from different projects.

A manual analysis of the differences showed which design decisions contributed to these changes:

Some libraries are harder to use than others. This is not necessarily due to deficiencies in the library. For instance, some applications may simply require more advanced features.
Some libraries accomodate multiple coding styles. Hibernate for example allows users to express queries using a Criteria mechanism (side note: This is what you saw in the code examples above) or HQL, a SQL-like query language. The latter is much more succinct and does not require as much efferent coupling, although it’s also a lot less type-safe.
There is no general consensus on where to place code for things like exception handling, logging, and debugging: they can appear in different design roles, which obviously affects the distribution of metrics.

Therefore one should also take other design decisions into account when building benchmarks for code quality analyses.

Finally, the authors compared metric distribution between different releases of the same project. While differences do exist, they do not occur very frequently. This suggests that it may be a good idea to include data from previous releases when building benchmarks for analyses.

Summary

The design role of a class within an application’s architecture can be classified with reasonable accuracy using a few heuristics
Different design roles have different distributions of metric values. Benchmarks should therefore take design roles into account
Other design decisions may also affect what metric values are “normal” and should therefore also be considered
The distribution of metrics does not change frequently between releases, so it’s useful to include older releases in benchmarks