A client I'm working with that the moment has been doing some work around calibrating an estimating model. This model is based on allocating deliverables passing through the project a number of 'points', based on their perceived complexity, and then creating a weighted estimate based on these points. We decided to calibrate the model using a more detailed estimate of a random sample of these deliverables.

A quick review of these figures yesterday revealed a strong correlation(around 0.85) between the number of points allocated to an item, and its resulting estimate. "Hurrah!", we said. Then we said: "This model is useful, and can give us a reasonable estimate of any given subset of the overall project and therefore in particular, of each of our planned iterations." Things were good. Then we twigged.

When we ran the calibration workshop, we asked people to estimate each deliverable. When we described these deliverables, we gave a brief description of the scope of the deliverable, and mentioned the number of points allocated to the deliverable. Nothing wrong with that, right? Well, we decided to do a little experiment, and re-ran the same test, with the same people, but a different set of deliverables. This time, we *didn't* tell them the number of points allocated to each deliverable.

The correlation was now 0.19. By most definitions, this means there is no correlation whatsoever. Our model is broken.

So, what's going on there? I *think* (and I'm no statistician) that we're seeing human nature at work. If you tell people something is twice as hard as something else, they're inclined to estimate it'll take roughly twice as long. If you estimate something is three times as hard, the estimate will be three times as long. When we estimate, we don't know we're doing this, as our gut (rather than our head) is doing the heavy lifting here - it's hard to apply a lot of intellectual muscle to something that's ill defined. Gut bases its decision on whatever information is easily available; in this case, someone just told us this thing is 'hard, twice as hard as the last thing', so the number we come up will start off roughly twice as high. If we get some information that makes us believe it's simpler than this, then we might try and adjust Gut's estimate downwards a little, but we'll likely never estimate it totally objectively after being told initially that it's 'hard'.

Thankfully, for us, this hasn't caused a problem. We're mainly interested in the overall averages rather than the specific estimates. We can still predict reasonably accurately the overall length of the project, even if we're a little out on the fine details. The lesson here is clear though: Be careful how much trust you put in these kinds of estimating excercise. They may not be as scientific or accurate a you first believe.