How do design decisions affect the distribution of software metrics? (2018)

Lady Justice weighs two object designs against each other
Let’s just hope your project’s code is in a good shape

Many organisations use static analysis tools to calculate metrics, which they then use to grade the quality of their software projects. This might not necessarily be a good approach however, as this study by Dósea, Sant’Anna, and da Silva shows that metrics don’t mean that much without proper context.

Why it matters

Most developers will agree that very long methodsOr functions, but I’ll keep calling them methods, because we’re dealing with Java are hard to maintain and should be refactored.

When is a function or method too long? Many static analysis toolsWell-known examples include SonarQube and Scrutinizer often use a single pre-defined threshold. This is far from ideal, because the design role of a class may affect what’s considered to be acceptable. For instance, a threshold like 20 lines of code may be too low for a View, but too high for a Controller.

You can of course set different thresholds for different types of classes like models, views, and controllers using the approach used by Aniche et al. (2018). But that might not solve everything.

Firstly, there are many classes that don’t fit into some well-known reference architecture like MVC. One might use some default threshold value as a fallback, but then you’re back to square one.

Secondly, two classes that are implemented using the same language and share the same design role might still look very different if the used libraries are not the same.

Finally, some coding styles may require more lines than others. Compare the two snippets below: both query a database and retrieve a list of Airport objects, but the first is considerably shorter:

public List<Airport> listByCountry(Country country) {
    return getSession().createCriteria(Airport.class).add(Restrictions.eq("airport.country", country)).list();
}
public List<Airport> listByCountry(Country country) {
    final Criteria criteria = this.getSession()
                                  .createCriteria(Airport.class);
    
    criteria.createAlias("country", "c");
    criteria.add(Restrictions.eq("c.id", country.getId()));

    return criteria.list();
}

How large are these differences in practice and can (and should) we take these into account when interpreting metrics about our code’s quality?

How the study was conducted

The authors designed and implemented a tool, DesignRoleMiner, that can infer the design role of a class, based on a number of characteristics:

  1. The presence of certain keywords in a class name, like DTO;
  2. Class annotations that contain a keyword, like @Service;
  3. Inheritance from a base class that contains a keyword, like as AbstractController;
  4. Implementation of interfaces – some of these are “known” roles (e.g. Repository), whilst others are mapped to new roles;
  5. Classes that mostly just consist of attributes, getters, and setters – these are likely to be entities.

For the actual study, fifteen representative Java projectsi.e. popular, actively maintained, from different domains (Web, Android, and desktop), and no frameworks and libraries were selected from GitHub.

The authors then used the tool to assign a design role to each of the projects’ classes and computed some metrics for each of their methods:

What discoveries were made

Within the same project one can find significant differences in the distribution of metric values of (at least two) different design roles.

For example, design roles like Entity and Exception typically don’t do much other than encapsulating data and therefore generally have few lines of code. Methods in other roles – like AsyncTask in one of the studied projects – may be more complex and thus require more lines of code and input parameters.

The authors then analysed design roles that appeared in at least two projects. In many cases a significant difference can be found between classes with the same design role from different projects.

A manual analysis of the differences showed which design decisions contributed to these changes:

Therefore one should also take other design decisions into account when building benchmarks for code quality analyses.

Finally, the authors compared metric distribution between different releases of the same project. While differences do exist, they do not occur very frequently. This suggests that it may be a good idea to include data from previous releases when building benchmarks for analyses.

The important bits

  1. The design role of a class within an application’s architecture can be classified with reasonable accuracy using a few heuristics
  2. Different design roles have different distributions of metric values. Benchmarks should therefore take design roles into account
  3. Other design decisions may also affect what metric values are “normal” and should therefore also be considered
  4. The distribution of metrics does not change frequently between releases, so it’s useful to include older releases in benchmarks