An empirical validation of cognitive complexity as a measure of source code understandability
Most of the time in software development is spent on code understanding. This is why it is so important to make code easier to understand.
One way to do this is by automatically assessing the understandability of code using metrics, like McCabe’s cyclomatic complexity or Halstead complexity measures. Developers can use these metrics to see which parts of the code need to be improved. Unfortunately, none of the existing metrics are actually capable of properly measuring understandability of code.
The cognitive complexity metric was introduced in 2017 by SonarSource and is explicitly designed to measure the understandability of code. It aims to achieve this goal by calculating a score that:
-
ignores structures that allow multiple statements to be ;
-
increments for each break in the linear flow of code;
-
increments when flow-breaking structures are nested.
The metric has since been accepted by developers, in the sense that they agree with the results and resolve issues that are identified by the metric. However, it has not been evaluated empirically. This could be problematic, because using metrics without proper validation could cause developers to make wrong decisions.
The study starts with a systematic literature review of existing studies that measure code understandability from a developer’s perspective. Some of those studies come with datasets of code snippets for which the understandability has been measured using a variety of methods.
The authors reuse these datasets to conduct a meta-analysis, in which they look for the presence (or absence) of correlations between the cognitive complexity metric and the measurements of the actual understandability of code snippets.
Understandability is measured in various different ways.
In experiments participants are often given a piece of code to read and asked to answer some questions about it. It is safe to assume that if the code is harder to understand, the participant will need more time before they feel comfortable answering any questions about it. The cognitive complexity metric is positively correlated with the amount of time that participants needed, which is a good first sign that the metric accurately measures understandability.
In such experiments, one would also expect that the ratio of correctly answered questions is higher for code that is easy to understand. Here the authors found mixed results: correlations between the cognitive complexity metric and correctness measurements range from -0.52 to 0.57. A possible explanation for these mixed results is that participants may spend more time on code that is hard to understand. This can make it more likely that they answer questions about hard-to-understand code correctly.
In some studies, participants are given pieces of code and asked to rate how difficult they believe a piece of code is to understand. This is used to measure perceived understandability. The meta-analysis shows a positive result, meaning that code with low understandability ratings is associated with higher cognitive complexity scores.
In other types of studies researchers monitor participants’ physiological data while they study pieces of code. In this case there was one study that measured the amount of concentration required to read code. The authors found a mean correlation of 0.00, which is not exactly what one could call “strong”.
Finally, the authors calculated the correlation between the cognitive complexity metric and composite variables for code understandability, which are composed of measures for time and correctness. Analysis shows a medium positive correlation of 0.40.
The overall conclusion is that the cognitive complexity metric seems to correlate reasonably well with source code understandability, although we still don’t know what is the best way to interpret the resulting score: when is code too complex?
The cognitive complexity metric…
-
can be used to show how much time developers might need to correctly understand a piece of code
-
can also give an indication of how developers perceive the difficulty of a piece of code