Negative results for software effort estimation (2017)
COCOMO is one of those terms that gets thrown around a lot in engineering schools, but seemingly disappears into thin air once you enter the software industry “because reliable effort estimation is a holy grail” or “because it’s obsolete”. This study shows that in the case of effort estimation methods, newer isn’t necessarily better.
Why it matters
Effort estimation is an important part of software project management: under-estimation causes schedule and budget overruns and possibly even project cancellation, while over-estimation can cause a project to be cancelled before it has even started.
Researchers in the 1970s and 1980s developed parametric estimation models like Boehm’sHe’s also one of the authors of this paper, which is not suspicious at all. 🤔 Constructive Cost Model (COCOMO), which helps you estimate the amount of effort that’s needed to complete a project.
The original version of COCOMO was developed in the 70s by combining existing expertise about effort estimation with data from many software projects. COCOMO II, the current version, was published in 2000 and is tuned for more modern projects.
It assumes that different types of software projects (e.g. embedded software, business data processing, scientific software) have different cost characteristics due to differences in factors like product complexity, programmer capability, personnel continuity, architecture or risk resolution, and team cohesion.
COCOMO models can be tuned using local project data or approximated using so-called local calibration procedures like COCONUT.
Other methods based on regression, case-based reasoning, and other fancy machine learn-y algorithms have been proposed since then, but parametric estimation is still the most widely used method.
This is strange: new methods are only (supposed to be) proposed if they are better than existing ones, so why are newer methods not adopted by industrial practitioners? Should we spend more effort into promoting newer models or are there other reasons why COCOMO is still king?
How the study was conducted
Effort estimation methods are normally used to predict effort of projects before they’re started. But for evaluation purposes it makes more sense to apply them to projects that have already been completed, as it allows you to compare the estimated effort with the actual effort. Four datasets are used in the study:
- The original dataset that was published along with the COCOMO model. This dataset contains projects from the 1970 to 1980;
- A collection of NASA projects that produced software for the International Space Station in the early 1990s ;
- Two newer proprietary datasets with newer NASA and (presumably) commercial projects from the 2000s.
The authors compare COCOMO-II and (the COCOMO-based) COCONUT with a bunch of other methods, which include simple methods based on lines of code (LOC), and more sophisticated methods like CART, k-nearest, ATLM, TEAK, and PEEKING2.
What discoveries were made
The study basically consists of two parts: the first part compares COCOMO with other methods, while the second part deals with COCOMO’s perceived complexity.
COCOMO put to the test
It’s often believed that parametric estimation methods like COCOMO-II are no better than simple lines of code (LOC) measures.
Results show that estimations made using COCOMO-II and COCONUT have a much smaller standard error than estimations based on projects with a similar amount of code. This is hardly surprising of course, as LOC-based methods implicitly assume that any two projects with similarly-sized codebases require equal amounts of effort to complete, regardless of factors like domain complexity.
COCOMO-II and COCONUT also hold their ground quite well against the more sophisticated methods. Another interesting observation is that, for all the talk about the world of software development being a ridiculously fast-moving field, the COCOMO-II tunings from 2000 are still useful for newer projects.
A simplified COCOMO
COCOMO can be costly to adopt at new organisations:
- Analysts need to be trained before they can consistently generate project rankings (very low, low, nominal, high, very high, or extremely high);
- New models are tuned only after collecting hundreds of project examples;
- The model takes 24 attributes into consideration, which may be a bit too much for new adopters.
The authors therefore explored the usefulness of simplified versions that aim to mitigate some of those issues:
- The six-point scale is reduced to a scale with just three possible values: nominal, above nominal, and below nominal;
- The model is trained on just 4 or 8 randomly selected projects;
- The number of attributes is reduced to remove potentially noisy and correlated attributes; this should improve the reliability of predictions.
A comparison of various simplified versions with regular COCOMO suggests these simplified COCOMO models are an acceptable substitute in cases where the full version may be too hard to implement.