Using planning poker for combining expert estimates in software projects (2008)

Two planning poker participants have made the same estimates
Poke’s on you

This week’s post looks at planning poker, a technique for estimating the effort needed to complete user stories rather than entire projects. Planning poker is typically used in conjunction with Scrum, so most of you will already know what planning poker is and how it works. But how well does it work?

Why it matters

I don’t think I’ve ever seen a Scrum team in the Netherlands that uses something other than planning poker for user story estimation: it’s almost like you’re not a real Scrum team if you don’t use planning poker.

Scrum predates planning poker by about 7 years however, so there was a time when Scrum teams still had to use techniques like decision markets, the Delphi methodNot to be confused with the Delphi programming language or its cousin Wideband Delphi.

Most of these techniques are “dead”, but it’s still possible to make use of individual estimates in 2019. How do group consensus estimation techniques like planning poker differ from estimation by individuals?

How the study was conducted

The setup is somewhat reminiscent of the one we saw last week, but this time it’s a comparative case study between members of a Scrum team at a Norwegian software development company.

The team is split into two groups:

The type, frequency, and duration of changes made by both groups were recorded throughout the sprint. Afterwards, the authors individually interviewed all participants about the project, their thoughts about planning poker, and how they felt about planning poker compared to using individual estimates.

What discoveries were made

Many studies conducted in the late 90s found that individuals are generally optimistic in their estimates. Talking about these estimates in a group makes that optimistic bias even stronger.

The results suggests that the opposite may be true for software effort estimates: planning poker participants were less optimistic after discussions. This decrease in optimism can likely be attributed to new information, and to a lesser degree, pressure from seniors and desire for consensus.

Does this also mean that group estimates are more accurate though, or are they just pessimistic? It appears that the group estimates are slightly better than a statistical combination of individual estimates. The effect is small however.

In fact, when the individual estimates are compared directly with those from the planning poker group as a whole, both methods appear to have similar estimation accuracy.

This doesn’t mean that the estimates are similar to each other – estimates made by the planning poker group were higher and the median amount of time spent on tasks was even 100% higher. It’s not entirely clear why. The authors speculate that there might be several, interacting causes:

There are clearly visible differences in the code changes as well: changes made by the planning poker group were larger overall, although the size of each task was smaller than in the control group. This supports the participants’ perception that planning poker is a good way to identify sub-tasks and challenges.

The important bits

  1. Group consensus estimates are likely to be less optimistic than estimates that were produced individually
  2. Group concensus estimates are probably only slightly more accurate than individual estimates
  3. Planning poker discussions makes it less likely that sub-tasks or challenges are overlooked