Codebases often contain code clones: code fragments that are very similar or even completely identical to each other. Until now, only larger clones have been studied thoroughly – not much is known about micro-clones, which are only 1–4 lines of code. Mondai, Roy, and Schneider show that these micro-clones are quite widespread.
Why it matters
The characteristics and impact of code clones on software development and maintenance have been studied extensively by researchers.
While some have found that code cloning has positive effects, there’s also plenty of strong evidence that code cloning make programs more prone to bugs due to unintentional inconsistencies.
Tools that keep track of clones within a codebase can help mitigate these issues. Most tools – and researchers for that matter – only look at larger clones, as it’s generally thought that smaller code clones don’t really matter that much.
Those smaller clones are called “micro-clones” and may be as small as a .
The authors argue that micro-clones can also have a strong negative effect on software quality and should therefore also be covered by tracking tools.
How the study was conducted
The authors mine commits from six open-source Java and C application repositories for micro-clones.
Intuitively, code clones can be recognised by looking for all lines that and are changed in the same way within a single commit: the same line might have been added, updated, or removed in multiple places.
All these lines might be micro-clones, but it’s also possible that they’re simply part of regular code clones, i.e. clones that are at least 5 lines of code.
Therefore the NiCad clone detector is executed on the same commits to detect regular code clones. Any change that is not included in the set of regular code clones, is likely a micro-clone.
This yields sufficient information for statistical and qualitative analyses of micro-clones.
What discoveries were made
It turns out that micro-clones are very common.
The majority of consistent changes (about 80%) that were made throughout the history of the six projects occur in micro-clones. .
Manual analysis of 300 micro-clones suggests that most of these changes are non-trivial: the changes aren’t merely changes in spacing or variable naming, but actually affect what the program does or shows.
Most changes that are consistently made in micro-clones are updates (80%). Additions (12%) and deletions (8%) are comparatively rare.
The distribution of micro-clone sizes tends to vary a bit among the six applications, but single-line micro-clones appear to be the most common in many of the studied repositories.
Finally, the authors note that micro-clone pairs usually reside in the same file, although this does not necessarily have to be the case.
Micro-clones, which are code clones that measure less than 5 lines of code, may constitute about 80% of all code clones
Most changes to micro-clones consist of modifications to existing lines. Only few changes are additions or deletions.
Many micro-clones are single-line clones
Clone trackers should keep track of micro-clones: failure to update them consistently may result in bugs or unexpected behaviour