Select Page

Note: this article contains some math that might scare the ones that are not really used to it. Fear not, and feel free to ask for further explanations through a comment, I will be glad to help you out understanding everything explained in here. Otherwise, if I was not precise enough or made some mistakes, please feel equally free to integrate or correct.

A few days ago I wrote about estimating a project with Story Points; now it is time to consider effort-based estimation. We already agreed that assessing projects with effort is difficult because of the high level of uncertainty and multiple biases that are not easy to factor in.

The demonstration is as simple as asking to your team how long would it take to each of them to run the perimeter of a popular park. The answers are likely to spread on a pretty wide range, depending on how fit that person is. If you instead asked them to tell you how long they think the perimeter of the same park is, the replies would very likely be quite close to each other. That is what story points are about: remove personal biases from the estimation process.

That does not mean that we can not or should not estimate using effort and time. We just need to do so factoring in the intrinsic uncertainty generated by the same personal biases we just saw in action, and to do so we will use range estimating techniques, such as the three-point estimation. The ultimate purpose of this method is to establish the range of the total project estimate and help allocate contingency using three values: a best case, a worst case, and a most likely case.

We are just entering into the reign of probabilities, and we prepare adequately for it.

## Let’s roll the dice

Imagine that every task in your project had to be carried out by two people in sequence and that both of them could perform any task within 1 to 6 hours. Therefore, the best case scenario for a single task would be 2 hours, the worst 12. However, how probable are those outcomes? Moreover, how likely is any other outcome between those two values?

In the given example it is like rolling two 6-sided dice: there are 36 different possible combinations, and only 1 of them returns numbers whose sum is a 2 (less than 3%). At the same time, there are six different combinations whose sum is a 7, almost 17% probability. If we calculated the probability for every value between 2 and 12 to occur and represented the probability on a chart, we would get a normal distribution.

The problem with estimating tasks and projects is that not all values in the range are likely to have an equal probability of occurrence, and this is why our estimate will hardly conform to a particular probability density function.

However, a reasonable approximation is to use one of two distributions:

• the triangular distribution
• the double-triangular distribution

In most cases, the double-triangular distribution is considered a better approximation since it can be made to conform to the implicit skew of the team’s assessment and it does not dictate a probability – like the triangular distribution.

## Triangular distribution

In a triangular distribution the three edges of the triangle are given by three values:

• the best case scenario (a)
• the most likely scenario (m)
• the worst case scenario (b)

Moreover, each of these values is treated equally.

Our estimate is given by the arithmetic mean:

$E = \frac{a+b+c}{3}$

For example, given the following values:

• best case: 3 hours
• most likely case: 8 hours
• worst case: 10 hours

Our estimate would be:

$E = \frac{3+8+10}{3} = 7$

7 hours. The probability of being under the most likely case is:

$P = \frac{8-3}{10-3} = 62,5%$

If the project team believes that this implied probability is unrealistic most of the times, you should consider switching to the double-triangular distribution for your estimate.

It is also useful to calculate the standard deviation, i.e. the measure of how spread out numbers are, to establish a level of confidence. The standard deviation of a triangular distribution is calculated with:

$\sigma = \sqrt[2]{\frac{a^{2}+b^{2}+c^{2}-ab-bc-ac}{18}}$

and in our example it is:

$\sigma = \sqrt[2]{\frac{9+64+100-24-80-30}{18}} = 1.47$

1.47 hours.

## Double Triangular Distribution

The double triangular distribution is based on two triangular distributions, one representing only the values which underrun the estimate and the other values which overrun the estimate.

Our estimate is still a mean, but where our most likely case weights much more than the other values. It weights exactly 4 times the other values – being 4 a quite arbitrary number:

$E = \frac{a+4m+b}{6}$

Using the same values as the previous example:

• best case: 3 hours
• most likely case: 8 hours
• worst case: 10 hours

we can calculate our estimate:

$E = \frac{3+32+10}{6} = 7,5$

in 7,5 hours.

In this case, the standard deviation is simply:

$\sigma = \frac{b-a}{6}$

Which means:

$\sigma = \frac{10-3}{6} = 1,17$

When it comes to estimating a project with n tasks, this same operation should be reiterated for every task and in the end, the total estimate should be calculated as the sum of the estimates of every single task (i.e. E1 + E2 + E3 + … + En), represented by the sum:

$E_{tot} = \sum_{k=1}^{n} E_{k}$

Moreover, the total standard deviation as the square root of the sum of the single standard deviations power two:

$\sigma_{tot} = \sqrt{\sum_{k=1}^{n} \sigma_{k}^{2}}$

It has been empirically observed that values lie within known ranges and therefore, we can typically rely on the following confidence levels:

• 68% confidence that the actual value will lie in the range E ± 1σ
• 95% confidence that the actual value will lie in the range E ± 2σ
• 99,7% confidence that the actual value will lie in the range E ± 3σ