Developers occasionally get questions about code that they have written. Such questions are not always easy to answer, especially if that code was written a long time ago. Krüger, Wiemann, Fenske, Saake, and Leich used an online survey to study how developers lose familiarity with “their” source code over time.
Why it matters
Developers are generally better at resolving bugs, adding new features, and estimating costs for code that they’ve worked on before.
But that doesn’t last forever: as developers move on to other parts of a codebase or even completely different projects, they’ll slowly forget the details of their previous work, and become less effective again.
This phenomenon is not completely understood yet.
Ebbinghaus’ forgetting curve is often used in psychology to describe how humans slowly forget information over time. The authors wondered whether this forgetting curve can also be applied to familiarity with source code.
How the study was conducted
An online survey was held among 60 developers that had worked on publicly accessible projects on GitHub.
Each developer was asked to choose a single file they had last worked on in , but refrain from checking its contents. The survey’s questions focussed on the chosen file. Some of the more important questions include:
After how many days do you only remember the structure and purpose of a file, but have forgotten the details?
How well do you track changes other developers make on your files?
How many lines of code does the file contain?
No pilot study was conducted.
What discoveries were made
The authors looked at three factors that might influence how well developers retain knowledge about source code:
Repetition: Developers that made more commits (on distinct days) reported being a lot more familiar with the file’s content than developers who didn’t make as many commits;
Ratio of own code: Developers who were the last to edit a larger proportion of a file’s lines of code claimed to be more familiar with its contents;
Tracking: Whether developers keep track of edits that others make to “their” code on familiarity.
The forgetting curve appears to describe knowledge retention fairly well for the 27 developers that only made a single change to their file, but underestimates retention for the 33 developers that did make multiple contributions.
Contributing repeatedly to a file has a strong positive effect on familiarity with its contents
Being the last developer to modify a large ratio of the code in a file also has a positive effect on familiarity
In theory developers’ loss of familiarity can be modelled using a forgetting curve. In practice you probably can’t.