Automatically assessing code understandability: How far are we?

Published: 14 Oct 2018
Written by: Chun Fei Lung

Forget LOC and cyclomatic complexity; can we just calculate the understandability of code directly?

“It looks pretty, but does it mean anything?”

Programmers spend much of their time reading code, so it’s important that it’s easy to understand. It would be nice if we could automatically calculate the understandability of code – unfortunately, Scalabrino et al. discovered that existing metrics aren’t good at predicting code understandability.

About the article

Title	Automatically assessing code understandability: How far are we?
Year	2017
Author(s)	Simone Scalabrino (University of Molise) Gabriele Bavota (Università della Svizzera italiana) Christopher Vendome (The College of William and Mary) Mario Linares-Vásquez (Universidad de los Andes) Denys Poshyvanyk (The College of William and Mary) Rocco Oliveto (University of Molise)
Venue	Proceedings of the 32nd International Conference on Automated Software Engineering

Why it matters

We know that code should be easy to understand, but we’re not entirely sure what makes it easy (or hard for that matter).

It’s often assumed that factors like cyclomatic complexity and readability affect understandability in some way. While this seems likely, there isn’t a lot of strong empirical evidence for these assumptions; for instance, many studies mostly focus on perceived readability, rather than actual understanding (side note: A code snippet might be short, simple, and well-documented, but that won’t be of much help if you aren’t at all familiar with the language, the libraries, or the domain.).

How the study was conducted

First, 50 representative methods were selected from 10 popular Java projects hosted on GitHub. 121 different metrics (side note: The metrics can be grouped into three categories: code structure and style, availability of documentation, and experience and background of the developer reading the code.) that might affect understandability were selected and calculated on those 50 methods.

Then, the authors conducted a survey with 46 participants to determine how understandable the methods really were.

Each participant was asked to read 8 of the methods, and asked if they thought they understood what the method did; if so, they had to answer several questions to verify that they really did understand the method correctly. The survey kept track of the amount of time a participant needed to answer each question.

This survey design makes it possible to determine:

whether a participant thinks they understand a piece of code;
how much time a participant needs before they think they understand a piece of code;
how well a participant actually understands a piece of code (based on the percentage of correctly answered questions);
how much time a participant needs to correctly understand a piece of code (again, based on the verification questions).

What discoveries were made

The results are disappointing, but valuable nonetheless. Many metrics show no correlation at all, and the ones that do are weak.

The authors discuss a few of the metrics. I’ll simply list those findings here. In a nutshell, developers:

perceive code with long lines as less pleasant;
that know the code’s programming language well, perceived it as more understandable;
thought the methods were more understandable if internal APIs were well-documented;
with more experience take longer to understand the code at first;
are less likely to actually understand the code if the method’s complexity is high;
need more time to correctly understand high-complexity code;
all need the same amount of time to correctly understand code, regardless of programming experience.