Belief & evidence in empirical software engineering

Published: 30 Sept 2018
Written by: Chun Fei Lung

Do we actually follow best practices or just… practices?

Copernicus is clearly on the right side here

Programmers can hold very strong beliefs about certain topics. Devanbu, Zimmermann, and Bird conducted a case study on such beliefs at Microsoft, and found that they are mostly based on personal experience rather than empirical evidence, and don’t necessarily correspond with what happens in reality.

About the article

Title	Belief & evidence in empirical software engineering
Year	2016
Author(s)	Prem Devanbu (UC Davis) Thomas Zimmermann (Microsoft Research) Christian Bird (Microsoft Research)
Venue	Proceedings of the 38th International Conference on Software Engineering

Why it matters

Programmers – like other humans – believe all sorts of things, e.g. that shorter functions are more maintainable than longer ones.

These beliefs can change over time, as we learn from new discoveries. However, we’re less likely to accept new discoveries if they strongly contradict what we already believe, unless the evidence is really, really convincing (side note: There are two major views of statistical analysis. The frequentist view is entirely observation-based and argues that probability is the only thing that matters. Bayesians also take prior beliefs into account, e.g. given two contradicting experimental results with an identical p-value, the least surprising result is more convincing.).

The consequences of ignoring actual empirical evidence can be severe: imagine how you’d feel if you were diagnosed with acute appendicitis, and your doctor would only use homeopathic treatments simply because she “believes” that works best!

Fortunately this doesn’t really happen in medicine; organisations like Cochrane and the American College of Physicians have made it easy for practitioners to access and keep up with the latest empirical findings.

Software engineering isn’t quite there yet, as this study shows.

How the study was conducted

The study consists of two parts.

In the first part, a list of empirically falsifiable claims (side note: You can find them in the table below!) was compiled and presented in a survey among 564 programmers at Microsoft. Each respondent was asked to rate for each of the claims how much they agreed with it on a 5-point Likert scale.

After analysing the results from the survey, the authors selected a single claim that turned out to be highly controversial and conducted a case study within Microsoft to determine whether the claim is true or false.

What discoveries were made

Results for the survey and the case study are presented separately.

The survey

The table below shows the programmers’ opinions about each claim; scores are on a scale of 1 (“strongly disagree”) to 5 (“strongly agree”), and variance is a measure of disagreement between respondents:

Question	Score	Variance
Code quality (defect occurrence) depends on which programming language is used	3.17	1.16
Fixing defects is riskier (more likely to cause future defects) than adding new features	2.63	1.08
Geographically distributed teams produce code whose quality (defect occurrence) is just as good as teams that are not geographically distributed	2.86	1.07
When it comes to producing code with fewer defects specific experience in the project matters more than overall general experience in programming	3.5	1.06
Well commented code has fewer defects	3.4	1.05
Code written in a language with static typing (e.g., C#) tends to have fewer bugs than code written in a language with dynamic typing (e.g. Python)	3.75	1.02
Stronger code ownership (i.e, fewer people owning a module or file) leads to better software quality	3.75	1.02
Merge commits are buggier than other commits.	3.4	0.97
Components with more unit tests have fewer customer-found defects	3.85	0.95
More experienced programmers produce code with fewer defects	3.86	0.94
More defects are found in more complex code	4.0	0.93
Factors affecting code quality (defect occurrence) vary from project to project	3.8	0.92
Using asserts improves code quality (reduces defect occurrence)	3.78	0.89
The use of static analysis tools improves end user quality (fewer defects are found by users)	3.77	0.87
Coding standards help improve software quality	4.18	0.79
Code reviews improve software quality (reduces defect occurrence)	4.48	0.64

Respondents that strongly agreed or disagreed with a claim were asked which sources they based their opinion on and which sources they valued most. The overall ranking is as follows:

Personal experience
Peer opinion
Mentor/manager
Trade journal
Research paper
Other

Strength of social connection appears to have a greater influence on programmers’ opinions than credibility of empirical evidence.

This is consistent with earlier findings within general populations, but it’s not exactly how a professional should form opinions.

The case study

The authors examined the highly controversial claim

Geographically distributed teams produce code whose quality, viz., defect occurrence, is just as good as teams that are not geographically distributed

in more detail, using data from two large projects within Microsoft: an operating system and a Web service.

Both projects have a similar number of programmers and are developed by teams that are distributed around the world.

Curiously enough, survey respondents that were part of the operating system project overwhelmingly disagreed with the claim, while respondents from the Web service project largely agreed with it.

Is there really a difference in quality between the two projects?

Method

The authors counted the number of bug fixes that were applied to each of the projects’ files and then determined whether 75% of commits were made within the same building, city, region, or country.

Some control measures were introduced to make sure that they really were measuring the effect of geographic distribution:

Average size in LOC: the larger a file is, the more likely it is to contain defects;
Total number of commits: the more a file changes, the more likely it is that defects will be introduced;
Number of distinct contributors: the number of programmers involved in a file is known to influence its quality;
Percentage of commits made by most frequent committer: strong file ownership also tends to influence quality.

Results

After running these variables through linear regression models, the authors found that the effect of distributed development is minimal (side note: Which is consistent with prior studies on this topic).

For both projects, it’s very slightly better to be in the same building, while for the Web service project it’s also possibly very slightly better to be within the same city. In all other cases, the quality of files that have commits within the same geographic area is _very slightly _ worse.

This shows that the programmers of the operating system project have formed beliefs that turn out to be false.

Summary

Programmers often have beliefs (e.g. about best practices) that aren’t necessarily true
Programmers primarily form their opinions based on sources with whom they share strong social connections
Code developed by a geographically distributed team is just as good as code developed by a co-located team