The Toilet Paper

Belief & evidence in empirical software engineering

Do we actually follow best practices or just… practices?

A catholic monk and a Renaissance scientist debate about astronomical models

Programmers can hold very strong beliefs about certain topics. Devanbu, Zimmermann, and Bird conducted a case study on such beliefs at Microsoft, and found that they are mostly based on personal experience rather than empirical evidence, and don’t necessarily correspond with what happens in reality.

Why it matters

Programmers – like other humans – believe all sorts of things, e.g. that shorter functions are more maintainable than longer ones.

These beliefs can change over time, as we learn from new discoveries. However, we’re less likely to accept new discoveries if they strongly contradict what we already believe, .

The consequences of ignoring actual empirical evidence can be severe: imagine how you’d feel if you were diagnosed with acute appendicitis, and your doctor would only use homeopathic treatments simply because she “believes” that works best!

Fortunately this doesn’t really happen in medicine; organisations like Cochrane and the American College of Physicians have made it easy for practitioners to access and keep up with the latest empirical findings.

Software engineering isn’t quite there yet, as this study shows.

How the study was conducted

The study consists of two parts.

In the first part, a list of was compiled and presented in a survey among 564 programmers at Microsoft. Each respondent was asked to rate for each of the claims how much they agreed with it on a 5-point Likert scale.

After analysing the results from the survey, the authors selected a single claim that turned out to be highly controversial and conducted a case study within Microsoft to determine whether the claim is true or false.

What discoveries were made

Results for the survey and the case study are presented separately.

The survey

The table below shows the programmers’ opinions about each claim; scores are on a scale of 1 (“strongly disagree”) to 5 (“strongly agree”), and variance is a measure of disagreement between respondents:

Code quality (defect occurrence) depends on which programming language is used3.171.16
Fixing defects is riskier (more likely to cause future defects) than adding new features2.631.08
Geographically distributed teams produce code whose quality (defect occurrence) is just as good as teams that are not geographically distributed2.861.07
When it comes to producing code with fewer defects specific experience in the project matters more than overall general experience in programming3.51.06
Well commented code has fewer defects3.41.05
Code written in a language with static typing (e.g., C#) tends to have fewer bugs than code written in a language with dynamic typing (e.g. Python)3.751.02
Stronger code ownership (i.e, fewer people owning a module or file) leads to better software quality3.751.02
Merge commits are buggier than other commits.3.40.97
Components with more unit tests have fewer customer-found defects3.850.95
More experienced programmers produce code with fewer defects3.860.94
More defects are found in more complex code4.00.93
Factors affecting code quality (defect occurrence) vary from project to project3.80.92
Using asserts improves code quality (reduces defect occurrence)3.780.89
The use of static analysis tools improves end user quality (fewer defects are found by users)3.770.87
Coding standards help improve software quality4.180.79
Code reviews improve software quality (reduces defect occurrence)4.480.64

Respondents that strongly agreed or disagreed with a claim were asked which sources they based their opinion on and which sources they valued most. The overall ranking is as follows:

  1. Personal experience
  2. Peer opinion
  3. Mentor/manager
  4. Trade journal
  5. Research paper
  6. Other

Strength of social connection appears to have a greater influence on programmers’ opinions than credibility of empirical evidence.

This is consistent with earlier findings within general populations, but it’s not exactly how a professional should form opinions.

The case study

The authors examined the highly controversial claim

Geographically distributed teams produce code whose quality, viz., defect occurrence, is just as good as teams that are not geographically distributed

in more detail, using data from two large projects within Microsoft: an operating system and a Web service.

Both projects have a similar number of programmers and are developed by teams that are distributed around the world.

Curiously enough, survey respondents that were part of the operating system project overwhelmingly disagreed with the claim, while respondents from the Web service project largely agreed with it.

Is there really a difference in quality between the two projects?


The authors counted the number of bug fixes that were applied to each of the projects’ files and then determined whether 75% of commits were made within the same building, city, region, or country.

Some control measures were introduced to make sure that they really were measuring the effect of geographic distribution:

  • Average size in LOC: the larger a file is, the more likely it is to contain defects;

  • Total number of commits: the more a file changes, the more likely it is that defects will be introduced;

  • Number of distinct contributors: the number of programmers involved in a file is known to influence its quality;

  • Percentage of commits made by most frequent committer: strong file ownership also tends to influence quality.


After running these variables through linear regression models, the authors found that .

For both projects, it’s very slightly better to be in the same building, while for the Web service project it’s also possibly very slightly better to be within the same city. In all other cases, the quality of files that have commits within the same geographic area is very slightly worse.

This shows that the programmers of the operating system project have formed beliefs that turn out to be false.


  1. Programmers often have beliefs (e.g. about best practices) that aren’t necessarily true

  2. Programmers primarily form their opinions based on sources with whom they share strong social connections

  3. Code developed by a geographically distributed team is just as good as code developed by a co-located team

Related articles