Chuniversiteit logomarkChuniversiteit.nl
The Toilet Paper

How do design decisions affect the distribution of software metrics?

The authors of this week’s paper built a tool that can infer the design role of classes and let it analyse some Java projects.

Lady Justice weighs two object designs against each other
Let’s just hope your project’s code is in a good shape

Many organisations use static analysis tools to calculate metrics, which they then use to grade the quality of their software projects. This might not necessarily be a good approach however, as this study by Dósea, Sant’Anna, and da Silva shows that metrics don’t mean that much without proper context.

Why it matters

Link

Most developers will agree that very long are hard to maintain and should be refactored.

When is a function or method too long? Many often use a single pre-defined threshold. This is far from ideal, because the design role of a class may affect what’s considered to be acceptable. For instance, a threshold like 20 lines of code may be too low for a View, but too high for a Controller.

You can of course set different thresholds for different types of classes like models, views, and controllers using the approach used by Aniche et al. (2018). But that might not solve everything.

Firstly, there are many classes that don’t fit into some well-known reference architecture like Model-View-Controller (MVC). One might use some default threshold value as a fallback, but then you’re back to square one.

Secondly, two classes that are implemented using the same language and share the same design role might still look very different if the used libraries are not the same.

Finally, some coding styles may require more lines than others. Compare the two snippets below: both query a database and retrieve a list of Airport objects, but the first is considerably shorter:

How large are these differences in practice and can (and should) we take these into account when interpreting metrics about our code’s quality?

How the study was conducted

Link

The authors designed and implemented a tool, DesignRoleMiner, that can infer the design role of a class, based on a number of characteristics:

  1. The presence of certain keywords in a class name, like DTO;

  2. Class annotations that contain a keyword, like @Service;

  3. Inheritance from a base class that contains a keyword, such as AbstractController;

  4. Implementation of interfaces – some of these are “known” roles (e.g. Repository), whilst others are mapped to new roles;

  5. Classes that mostly just consist of attributes, getters, and setters – these are likely to be entities.

For the actual study, fifteen were selected from GitHub.

The authors then used the tool to assign a design role to each of the projects’ classes and computed some metrics for each of their methods:

  • McCabe’s cyclomatic complexity;

  • The number of method parameters;

  • Lines of executable statements;

  • .

What discoveries were made

Link

Within the same project one can find significant differences in the distribution of metric values of (at least two) different design roles.

For example, design roles like Entity and Exception typically don’t do much other than encapsulating data and therefore generally have few lines of code. Methods in other roles – like AsyncTask in one of the studied projects – may be more complex and thus require more lines of code and input parameters.

The authors then analysed design roles that appeared in at least two projects. In many cases a significant difference can be found between classes with the same design role from different projects.

A manual analysis of the differences showed which design decisions contributed to these changes:

  • Some libraries are harder to use than others. This is not necessarily due to deficiencies in the library. For instance, some applications may simply require more advanced features.

  • Some libraries accomodate multiple coding styles. Hibernate for example allows users to or HQL, a SQL-like query language. The latter is much more succinct and does not require as much efferent coupling, although it’s also a lot less type-safe.

  • There is no general consensus on where to place code for things like exception handling, logging, and debugging: they can appear in different design roles, which obviously affects the distribution of metrics.

Therefore one should also take other design decisions into account when building benchmarks for code quality analyses.

Finally, the authors compared metric distribution between different releases of the same project. While differences do exist, they do not occur very frequently. This suggests that it may be a good idea to include data from previous releases when building benchmarks for analyses.

Summary

Link
  1. The design role of a class within an application’s architecture can be classified with reasonable accuracy using a few heuristics

  2. Different design roles have different distributions of metric values. Benchmarks should therefore take design roles into account

  3. Other design decisions may also affect what metric values are “normal” and should therefore also be considered

  4. The distribution of metrics does not change frequently between releases, so it’s useful to include older releases in benchmarks