The Toilet Paper

Old habits die hard: Why refactoring for understandability does not give immediate benefits

Refactoring code improves code maintainability, reusability, and other “ilities”, but does not speed up development – at least not at first.

Cavemen try to ride up a hill in a car made of stone
Is it an improvement? I guess wheel never know with these uphill battles

Whenever shortcuts are taken during the development of a software system, it accumulates technical debt.

This debt makes it harder to understand and make changes to the system, so the development speed for a system with a lot of technical debt will eventually come to a grinding halt.

Why it matters


Refactoring is a process where the structure of code is improved without changing the functionality of the system. argue that well-structured code is easier to understand, and thus easier to modify and less prone to bugs.

Unfortunately there is little empirical evidence that refactoring actually has beneficial effects on developer productivity. This study tries to shed some light on the matter.

How the study was conducted


A comparative experiment was conducted at Exact, a software company that produces business software with development teams that are distributed over multiple continents.

The study consists of 5 different experiments and included 30 participants (all developers) from 11 different teams and two different countries (Malaysia and The Netherlands).

In each experiment, a developer was asked to perform a small coding task on components from a codebase with 2.7 : they either had to fix a small bug or make a small change in functionality. Participants in the experimental group were given a refactored version of the code, while those in the control group were given the original code.

The experiment includes three types of refactorings:

  • small Rename field or variable, and Extract function refactorings;

  • medium Extract class and Adapter pattern refactorings, ;

  • large refactorings to divide responsibilities, also accompanied by unit tests.

What discoveries were made


Results were mixed.



In the first (small) experiment some helper methods were extracted from the code. Surprisingly, developers who saw the refactored version needed more time to make the requested change, not less.

The second (small) experiment had a similar setup, but was (apparently) easier to complete. This means that the productivity measurements for this experiment are less noisy. In this case, about 75% of the participants in the experimental group finished before 25% of the developers with the original code.

The third (small) experiment again used similar refactorings and also resulted in lower finishing times for those who saw the original code without refactorings. It’s possible that flow of method arguments and return values between multiple smaller methods was harder to understand than a linear flow in a large method.

In the fourth (medium) experiment participants were asked to fix a bug. It appears that those in the experimental group had slightly lower finishing times than those in the control group. Another notable finding is that developers who were quite experienced in unit testing performed better than other participants.

In the fifth (large) experiment, developers who saw the original code once again did much better than developers who had to work with the refactored code, presumably because it takes more time to understand the relations between classes that emerge from a large refactoring. However, the quality of solutions also differed: whereas most developers in the control group fixed the bug using a “quick fix”, those in the experimental group managed to fix the root cause.



The experimental results show that most of the time the original, unrefactored code was “better” for productivity. However, when the original and refactored code were shown to participants side-by-side, most preferred the refactored code.

The authors argue that this discrepancy can be explained by the habits of developers, who are used to reading long, procedural methods and thus simply need more time .

However, even if refactorings lead to a (possibly temporary) decrease in understandability, the possible increases in maintainability and testability could still make the refactoring worthwhile.



Refactoring bad code…

  1. does not necessarily make it more understandable, and

  2. might lower productivity in the short term, but

  3. is probably still worth the effort