Automatically assessing code understandability: How far are we? (2017)

An archaeologist investigates some unusual hieroglyphs inside an Egyptian tomb

Programmers spend much of their time reading code, so it’s important that it’s easy to understand. It would be nice if we could automatically calculate the understandability of code – unfortunately, Scalabrino et al. discovered that existing metrics aren’t good at predicting code understandability.

Why it matters

We know that code should be easy to understand, but we’re not entirely sure what makes it easy (or hard for that matter).

It’s often assumed that factors like cyclomatic complexity and readability affect understandability in some way. While this seems likely, there isn’t a lot of strong empirical evidence for these assumptions; for instance, many studies mostly focus on perceived readability, rather than actual understandingA code snippet might be short, simple, and well-documented, but that won’t be of much help if you aren’t at all familiar with the language, the libraries, or the domain..

How the study was conducted

First, 50 representative methods were selected from 10 popular Java projects hosted on GitHub. 121 different metricsThe metrics can be grouped into three categories: code structure and style, availability of documentation, and experience and background of the developer reading the code. that might affect understandability were selected and calculated on those 50 methods.

Then, the authors conducted a survey with 46 participants to determine how understandable the methods really were.

Each participant was asked to read 8 of the methods, and asked if they thought they understood what the method did; if so, they had to answer several questions to verify that they really did understand the method correctly. The survey kept track of the amount of time a participant needed to answer each question.

This survey design makes it possible to determine:

  1. whether a participant thinks they understand a piece of code;
  2. how much time a participant needs before they think they understand a piece of code;
  3. how well a participant actually understands a piece of code (based on the percentage of correctly answered questions);
  4. how much time a participant needs to correctly understand a piece of code (again, based on the verification questions).

What discoveries were made

The results are disappointing, but valuable nonetheless. Many metrics show no correlation at all, and the ones that do are weak.

The authors discuss a few of the metrics. I’ll simply list those findings here. In a nutshell, developers:

Do keep in mind that these findings are all based on weak correlations.

The important bits

  1. It’s still not feasible to automatically determine code understandability using static code analysis