Do you remember this source code? (2018)
Developers occasionally get questions about code that they have written. Such questions are not always easy to answer, especially if that code was written a long time ago. Krüger, Wiemann, Fenske, Saake, and Leich used an online survey to study how developers lose familiarity with “their” source code over time.
Why it matters
Developers are generally better at resolving bugs, adding new features, and estimating costs for code that they’ve worked on before.
But that doesn’t last forever: as developers move on to other parts of a codebase or even completely different projects, they’ll slowly forget the details of their previous work, and become less effective again.
This phenomenon is not completely understood yet.
Ebbinghaus’ forgetting curve is often used in psychology to describe how humans slowly forget information over time. The authors wondered whether this forgetting curve can also be applied to familiarity with source code.
How the study was conducted
An online survey was held among 60 developers that had worked on publicly accessible projects on GitHub.
Each developer was asked to choose a single file they had last worked on in 2016The study was originally published in 2018, so for the respondents this would have been one or two years ago, but refrain from checking its contents. The survey’s questions focussed on the chosen file. Some of the more important questions include:
- How well do you know the content of the file?On a Likert scale from 1 to 9 (actually from 0 to 10, but the authors disallowed those because they think those answers wouldn’t be realistic)
- After how many days do you only remember the structure and purpose of a file, but have forgotten the details?
- How well do you track changes other developers make on your files?
- How many lines of code does the file contain?
- When was the last date you edited the file?The last two questions are primarily used to exclude responses with a very high error rate
No pilot study was conducted.
What discoveries were made
The authors looked at three factors that might influence how well developers retain knowledge about source code:
Repetition: Developers that made more commits (on distinct days) reported being a lot more familiar with the file’s content than developers who didn’t make as many commits;
Ratio of own code: Developers who were the last to edit a larger proportion of a file’s lines of code claimed to be more familiar with its contents;
Tracking: Whether developers keep track of edits that others make to “their” code doesn’t appear to have a strong effectThe authors suspect that results may be skewed because different respondents might have interpreted “tracking” differently, which seems like a plausible explanation on familiarity.
The forgetting curve appears to describe knowledge retention fairly well for the 27 developers that only made a single change to their file, but underestimates retention for the 33 developers that did make multiple contributions.