Negative results for software effort estimation

Published: 13 Oct 2019
Written by: Chun Fei Lung

The reports of COCOMO’s death have been greatly exaggerated.

Coconut(s) can help you keep costs under control

COCOMO is one of those terms that gets thrown around a lot in engineering schools, but seemingly disappears into thin air once you enter the software industry “because reliable effort estimation is a holy grail” or “because it’s obsolete”. This study shows that in the case of effort estimation methods, newer isn’t necessarily better.

About the article

Title	Negative results for software effort estimation
Year	2017
Author(s)	Tim Menzies (North Carolina State University) Ye Yang (Stevens Institute) George Mathew (North Carolina State University) Barry Boehm (University of Southern California) Jairus Hihn (Jet Propulsion Laboratory and California Institute of Technology)
Venue	Empirical Software Engineering

Why it matters

Effort estimation is an important part of software project management: under-estimation causes schedule and budget overruns and possibly even project cancellation, while over-estimation can cause a project to be cancelled before it has even started.

Researchers in the 1970s and 1980s developed parametric estimation models like Boehm’s (side note: He’s also one of the authors of this paper, which is not at all suspicious.) Constructive Cost Model (COCOMO), which helps you estimate the amount of effort that’s needed to complete a project.

Other methods based on regression, case-based reasoning, and other fancy machine learn-y algorithms have been proposed since then, but parametric estimation is still the most widely used method.

This is strange: new methods are only (supposed to be) proposed if they are better than existing ones, so why are newer methods not adopted by industrial practitioners? Should we spend more effort into promoting newer models or are there other reasons why COCOMO is still king?

How the study was conducted

Effort estimation methods are normally used to predict effort of projects before they’re started. But for evaluation purposes it makes more sense to apply them to projects that have already been completed, as it allows you to compare the estimated effort with the actual effort. Four datasets are used in the study:

The original dataset that was published along with the COCOMO model. This dataset contains projects from the 1970 to 1980;
A collection of NASA projects that produced software for the International Space Station in the early 1990s;
Two newer proprietary datasets with newer NASA and (presumably) commercial projects from the 2000s.

The authors compare COCOMO-II and (the COCOMO-based) COCONUT with a bunch of other methods, which include simple methods based on lines of code (LOC), and more sophisticated methods like CART, k-nearest, ATLM, TEAK, and PEEKING2.

What discoveries were made

The study basically consists of two parts: the first part compares COCOMO with other methods, while the second part deals with COCOMO’s perceived complexity.

COCOMO put to the test

It’s often believed that parametric estimation methods like COCOMO-II are no better than simple lines of code (LOC) measures.

Results show that estimations made using COCOMO-II and COCONUT have a much smaller standard error than estimations based on projects with a similar amount of code. This is hardly surprising of course, as LOC-based methods implicitly assume that any two projects with similarly-sized codebases require equal amounts of effort to complete, regardless of factors like domain complexity.

COCOMO-II and COCONUT also hold their ground quite well against the more sophisticated methods. Another interesting observation is that, for all the talk about the world of software development being a ridiculously fast-moving field, the COCOMO-II tunings from 2000 are still useful for newer projects.

A simplified COCOMO

COCOMO can be costly to adopt at new organisations:

Analysts need to be trained before they can consistently generate project rankings (very low, low, nominal, high, very high, or extremely high);
New models are tuned only after collecting hundreds of project examples;
The model takes 24 attributes into consideration, which may be a bit too much for new adopters.

The authors therefore explored the usefulness of simplified versions that aim to mitigate some of those issues:

The six-point scale is reduced to a scale with just three possible values: nominal, above nominal, and below nominal;
The model is trained on just 4 or 8 randomly selected projects;
The number of attributes is reduced to remove potentially noisy and correlated attributes; this should improve the reliability of predictions.

A comparison of various simplified versions with regular COCOMO suggests these simplified COCOMO models are an acceptable substitute in cases where the full version may be too hard to implement.

Summary

COCOMO-II and COCONUT produce better estimates than contemporary estimation methods or simple LOC-based methods
COCOMO-II’s tunings from 2000 still apply to newer projects
Simplified versions of COCOMO still yield acceptable estimates