Old habits die hard: Why refactoring for understandability does not give immediate benefits

Published: 25 Apr 2021
Written by: Chun Fei Lung

Refactoring code improves code maintainability, reusability, and other “ilities”, but does not speed up development – at least not at first.

Is it an improvement? I guess wheel never know with these uphill battles

Whenever shortcuts are taken during the development of a software system, it accumulates technical debt.

This debt makes it harder to understand and make changes to the system, so the development speed for a system with a lot of technical debt will eventually come to a grinding halt.

About the article

Title	Old habits die hard: Why refactoring for understandability does not give immediate benefits
Year	2015
Author(s)	Erik Ammerlaan (Exact International Development) Wim Veninga (Exact International Development) Andy Zaidman (Delft University of Technology)
Venue	International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Why it matters

Refactoring is a process where the structure of code is improved without changing the functionality of the system. Many in the software development community (side note: Including prominent authors, like Robert C. Martin) argue that well-structured code is easier to understand, and thus easier to modify and less prone to bugs.

Unfortunately there is little empirical evidence that refactoring actually has beneficial effects on developer productivity. This study tries to shed some light on the matter.

How the study was conducted

A comparative experiment was conducted at Exact, a software company that produces business software with development teams that are distributed over multiple continents.

The study consists of 5 different experiments and included 30 participants (all developers) from 11 different teams and two different countries (Malaysia and The Netherlands).

In each experiment, a developer was asked to perform a small coding task on components from a codebase with 2.7 MLOC (side note: millions of lines of code): they either had to fix a small bug or make a small change in functionality. Participants in the experimental group were given a refactored version of the code, while those in the control group were given the original code.

The experiment includes three types of refactorings:

small Rename field or variable, and Extract function refactorings;
medium Extract class and Adapter pattern refactorings, accompanied by one or more unit tests (side note: The authors made sure that none of these tests demonstrate the presence of bugs.);
large refactorings to divide responsibilities, also accompanied by unit tests.

What discoveries were made

Results were mixed.

Results

In the first (small) experiment some helper methods were extracted from the code. Surprisingly, developers who saw the refactored version needed more time to make the requested change, not less.

The second (small) experiment had a similar setup, but was (apparently) easier to complete. This means that the productivity measurements for this experiment are less noisy. In this case, about 75% of the participants in the experimental group finished before 25% of the developers with the original code.

The third (small) experiment again used similar refactorings and also resulted in lower finishing times for those who saw the original code without refactorings. It’s possible that flow of method arguments and return values between multiple smaller methods was harder to understand than a linear flow in a large method.

In the fourth (medium) experiment participants were asked to fix a bug. It appears that those in the experimental group had slightly lower finishing times than those in the control group. Another notable finding is that developers who were quite experienced in unit testing performed better than other participants.

In the fifth (large) experiment, developers who saw the original code once again did much better than developers who had to work with the refactored code, presumably because it takes more time to understand the relations between classes that emerge from a large refactoring. However, the quality of solutions also differed: whereas most developers in the control group fixed the bug using a “quick fix”, those in the experimental group managed to fix the root cause.

Discussion

The experimental results show that most of the time the original, unrefactored code was “better” for productivity. However, when the original and refactored code were shown to participants side-by-side, most preferred the refactored code.

The authors argue that this discrepancy can be explained by the habits of developers, who are used to reading long, procedural methods and thus simply need more time to get used to dealing with multiple classes and methods (side note: This is a strange conclusion, because in the third and fifth experiments the refactored code really was harder to understand. I’m not saying that the authors started with a conclusion and worked their way backwards, but…).

However, even if refactorings lead to a (possibly temporary) decrease in understandability, the possible increases in maintainability and testability could still make the refactoring worthwhile.

Summary

Refactoring bad code…

does not necessarily make it more understandable, and
might lower productivity in the short term, but
is probably still worth the effort