Micro-clones in evolving software

Published: 13 Jan 2019
Written by: Chun Fei Lung

Duplicate code. Duplicate code everywhere.

Is this a gnome desktop?

Codebases often contain code clones: code fragments that are very similar or even completely identical to each other. Until now, only larger clones have been studied thoroughly – not much is known about micro-clones, which are only 1–4 lines of code. Mondai, Roy, and Schneider show that these micro-clones are quite widespread.

About the article

Title	Micro-clones in evolving software
Year	2018
Author(s)	Manishankar Mondai (University of Saskatchewan) Chanchal K. Roy (University of Saskatchewan) Kevin A. Schneider (University of Saskatchewan)
Venue	25th International Conference on Software Analysis, Evolution and Reengineering

Why it matters

The characteristics and impact of code clones on software development and maintenance have been studied extensively by researchers.

While some have found that code cloning has positive effects, there’s also plenty of strong evidence that code cloning make programs more prone to bugs due to unintentional inconsistencies.

Tools that keep track of clones within a codebase can help mitigate these issues. Most tools – and researchers for that matter – only look at larger clones, as it’s generally thought that smaller code clones don’t really matter that much.

Those smaller clones are called “micro-clones” and may be as small as a single line of code (side note: Typical examples of cloned one-liners are invocations or declarations with hard-coded values, e.g. CSS declarations like color: #c90016.).

The authors argue that micro-clones can also have a strong negative effect on software quality and should therefore also be covered by tracking tools.

How the study was conducted

The authors mine commits from six open-source Java and C application repositories for micro-clones.

Intuitively, code clones can be recognised by looking for all lines that look the same (side note: Strictly speaking, this only applies to Type 1 clones. Code fragments that have different types or identifiers, but are syntactically the same are called Type 2 clones, while Type 3 clones consist of fragments that are almost syntactically identical.

This study mostly focusses on Type 1 and 2 clones, as Type 3 clones are close to impossible to detect in micro-clones.) and are changed in the same way within a single commit: the same line might have been added, updated, or removed in multiple places.

All these lines might be micro-clones, but it’s also possible that they’re simply part of regular code clones, i.e. clones that are at least 5 lines of code.

Therefore the NiCad clone detector is executed on the same commits to detect regular code clones. Any change that is not included in the set of regular code clones, is likely a micro-clone.

This yields sufficient information for statistical and qualitative analyses of micro-clones.

What discoveries were made

It turns out that micro-clones are very common.

The majority of consistent changes (about 80%) that were made throughout the history of the six projects occur in micro-clones. Only 16% occurs in regular code clones (side note: The remaining 4% is “uncategorised” and consists of changes to or around single-line characters, like { and }. These actually don’t matter.).

Manual analysis of 300 micro-clones suggests that most of these changes are non-trivial: the changes aren’t merely changes in spacing or variable naming, but actually affect what the program does or shows.

Most changes that are consistently made in micro-clones are updates (80%). Additions (12%) and deletions (8%) are comparatively rare.

The distribution of micro-clone sizes tends to vary a bit among the six applications, but single-line micro-clones appear to be the most common in many of the studied repositories.

Finally, the authors note that micro-clone pairs usually reside in the same file, although this does not necessarily have to be the case.

Summary

Micro-clones, which are code clones that measure less than 5 lines of code, may constitute about 80% of all code clones
Most changes to micro-clones consist of modifications to existing lines. Only few changes are additions or deletions.
Many micro-clones are single-line clones
Clone trackers should keep track of micro-clones: failure to update them consistently may result in bugs or unexpected behaviour

Micro-clones in evolving software

Why it matters

How the study was conducted

What discoveries were made

Summary

More about code quality

Also on Chuniversiteit