The Toilet Paper

Using planning poker for combining expert estimates in software projects

You probably already know or even use planning poker. This paper tells you how well it works.

Two planning poker participants have made the same estimates
Poke’s on you

This week’s post looks at planning poker, a technique for estimating the effort needed to complete user stories rather than entire projects . Planning poker is typically used in conjunction with Scrum, so most of you will already know what planning poker is and how it works. But how well does it work?

Why it matters


I don’t think I’ve ever seen a Scrum team in the Netherlands that uses something other than planning poker for user story estimation: it’s almost like you’re not a real Scrum team if you don’t use planning poker.

Scrum predates planning poker by about 7 years however, so there was a time when Scrum teams still had to use techniques like decision markets , the or its cousin Wideband Delphi.

Most of these techniques are “dead”, but it’s still possible to make use of individual estimates in 2019. How do group consensus estimation techniques like planning poker differ from estimation by individuals?

How the study was conducted


The setup is somewhat reminiscent of the one we saw last week , but this time it’s a comparative case study between members of a Scrum team at a Norwegian software development company.

The team is split into two groups:

  • The first acts as a control group and uses the team’s normal way of working. Estimates for tasks are made individually, by the developer who will be assigned to the task. These estimates are not shared with other team members.

  • The second group is by the company’s chief scientist, which allows them to estimate tasks for their next sprint using planning poker.

The type, frequency, and duration of changes made by both groups were recorded throughout the sprint. Afterwards, the authors individually interviewed all participants about the project, their thoughts about planning poker, and how they felt about planning poker compared to using individual estimates.

What discoveries were made


Many studies conducted in the late 90s found that individuals are generally optimistic in their estimates. Talking about these estimates in a group makes that optimistic bias even stronger.

The results suggests that the opposite may be true for software effort estimates: planning poker participants were less optimistic after discussions. This decrease in optimism can likely be attributed to new information, and to a lesser degree, pressure from seniors and desire for consensus.

Does this also mean that group estimates are more accurate though, or are they just pessimistic? It appears that the group estimates are slightly better than a statistical combination of individual estimates. The effect is small however.

In fact, when the individual estimates are compared directly with those from the planning poker group as a whole, both methods appear to have similar estimation accuracy.

This doesn’t mean that the estimates are similar to each other – estimates made by the planning poker group were higher and the median amount of time spent on tasks was even 100% higher. It’s not entirely clear why. The authors speculate that there might be several, interacting causes:

  • Tasks were assigned randomly to the two groups. It’s possible that the planning poker group simply got the harder tasks.

  • Groups are able to identify more sub-tasks than individuals, especially if their members have diverse backgrounds. This may have resulted in larger tasks overall.

  • Initial individual estimates may have acted as anchors: participants were often willing to increase their estimates a little bit (one hour, or the “next card”) in subsequent rounds, but rarely did the final consensus estimates deviate significantly from the average of individual estimates.

  • Participants belive that scope (functionality and customer satisfaction) are more important than cost or time, and may thus feel more inclined to spend more effort in order to implement all identified tasks.

There are clearly visible differences in the code changes as well: changes made by the planning poker group were larger overall, although the size of each task was smaller than in the control group. This supports the participants’ perception that planning poker is a good way to identify sub-tasks and challenges.


  1. Group consensus estimates are likely to be less optimistic than estimates that were produced individually

  2. Group concensus estimates are probably only slightly more accurate than individual estimates

  3. Planning poker discussions makes it less likely that sub-tasks or challenges are overlooked