Started! Working! Done!: The news of Story Points' demise is greatly exaggerated

Friday, January 27, 2012

The news of Story Points' demise is greatly exaggerated

My friend Vasco Duarte just wrote an interesting piece on the subject of story points considered harmful. We've discussed this at length some years back (oh, the amounts of coffee consumed!) and I think his concept is good. However the article is fraught with errors and lacks a constructive conclusion.

By writing this blog post I intend to point out and correct some problems in the reasoning in Vasco's article. I don't have any special feelings for or against estimation. I don't think it's good if it becomes a cargo cult, something you just do because Scrum says so. That's true for most practices, by the way. But I believe estimation has a place as a "gateway drug" into collective software architecture and backlog maintenance. Once the team gets there, they can safely dump the story points.

Statistics

Let's start out by noting that statistical mathematics can explain many things. Small batches improve flow. More flow means more items flowing through the pipe, which means that the statistical variation of any attribute will be smaller. In the context of work queues and backlogs, a rate of perhaps 20-30 work items per week is enough to provide meaningful data based on counting work items (rather than summing up estimates).

The statistics presented by Vasco don't account for the fact that the estimate sum is dependent on the number of items. More items means a larger sum of estimates. To illustrate, I wrote a small script to output random data for a fictional team with 10 to 20 items completed every sprint and estimates ranging randomly from 1 to 13 using the Fibonacci scale. The correlation for this random data turns out to be around 0.73 and that is fully and totally the result of variable dependence. As mentioned above, increasing the number of items per sprint makes the statistical variation go down which increases the correlation between the two variables. At 70-130 items per sprint, the correlation is around 0.92. At 100-200 items it's 0.95.

Interestingly, three of Vasco's teams are actually doing worse than random. This is probably an argument against estimating. :-)

Complexity, predictability and chaos

Then let's move on to more human issues. Humans are unpredictable, but not as much as one would expect. The reason is that humans are social animals that can live and work together in groups for fun and profit. Indeed, we actually prefer to be part of a society or group. Hermits are rare and, bluntly put, a bit weird.

One of the mechanisms that allows us to live and work in groups is that we collectively construct social identities and institutions that constrain our actions. One such constraint could be: "In our team it's not OK to physically hurt people you disagree with". These identities and behaviors make us much more predictable. We can pretty much rely on Helen not to club down Tom in the meeting room with a chair during a heated technical debate, right?

If these identities don't exist or they are constructed wrong, there will be chaos. For example, the Vikings or Mongols of old would perhaps say that "in our society, physical power equals political power". In a conflict situation, Helen would club down Tom and everyone would laugh and nod and say what a nice chop that was, he hardly saw it coming. Next thing you know, Tom's brother Tim has set Helen's house on fire in the middle of the night and butchered everyone that tried to escape. But I digress.

Incomplete argumentation

Finally I'd like to point out that some of the counter-arguments to the six pro-estimation claims by Mike Cohn are weak. E.g. in claim 3 the article states "it can take days of work to estimate the initial backlog for a reasonable size project". It can (and often does) take days of work to construct an initial architecture for a reasonably sized project! You can't just start coding at random.

The blog post ignores the most important pro-estimation claim: the real benefit of estimating is that you get the whole team to participate in planning the software. This can be achieved in other ways, but most estimation techniques make it into a game that is fun to play. In the blog post I was looking for something to replace this activity.

Some other nitpicks:

The butterfly effect is commonly used to describe chaos theory, not complexity theory.
The complex environment is different from chaos in that it is causal but not predictable: in other words, causality is evident only when looking back but not when looking forward.

Conclusions

All in all, I think the key take-away of Vasco's post has a lot of merit, especially for seasoned Agile teams. However the article contains some serious leaps of faith, and the "myth of Story Points" is certainly not busted.

EDIT: Here's the Python script I used to create random data.

import random

for i in range(100):
  sumitems = 0
  sumestimates = 0
  for j in range(10 + random.randrange(10)):
    k = random.choice([1, 2, 3, 5, 8, 13])
    sumitems += 1
    sumestimates += k
  print sumitems, "\t", sumestimates

I then wimped out and did the correlation calculations in Excel.

Since random.randrange(10) actually results in a value between 0 and 9 inclusive, the script actually generates random data for sprints with 10 to 19 items, not 10 to 20 as specified. Doesn't invalidate the point though.

7 comments:

UnknownJanuary 27, 2012 at 3:11 PM
When you say "It can (and often does) take days of work to construct an initial architecture for a reasonably sized project! You can't just start coding at random." You forget to add that you *CAN* work without estimating, but not without a (even implicit) architecture. So that argument isn't holding up :)

When you say "The blog post ignores the most important pro-estimation claim: the real benefit of estimating is that you get the whole team to participate in planning the software."
You don't mention any data to prove your point. The fact is that you have no data, because what you need (people talking to each other) can be done in a *MILLION* ways, including some ways that are grounded on social sciences. Estimation is NOT grounded on social sciences, also there's no evidence that Estimation gets people to talk to each other. Talk to @josephpelrine on that.
My point: there are much better, more engaging, more structured, more supported by data ways of getting people to get people to participate in the planning! :)

"The butterfly effect is commonly used to describe chaos theory, not complexity theory."
TGhis is true, but I was making a point about *causality* not just chaos/complexity. That was mentioned in the scope of Complexity Sciences.

PLease do continue your line of argument. Your last phrase (myth not busted) is just speculation! You offer no data that supports that conclusion... PLease do write about this, I'd be pleased and honored to have my hypothesis refuted!

Remember Story POints (and the poker, etc.) were presented as "scientific" (just read Mike Cohn's books). But the data tells another story! Let's get the argument back to the data-backed statements and hypothesis. Not more speculation. PLease.... :)
ReplyDelete
Replies
Lasse KoskelaJanuary 27, 2012 at 11:37 PM
> You don't mention any data to prove your point.
> The fact is that you have no data, because what
> you need (people talking to each other) can be
> done in a *MILLION* ways, including some ways
> that are grounded on social sciences. [...]
> My point: there are much better, more engaging,
> more structured, more supported by data ways
> of getting people to get people to participate in
> the planning! :)

Care to name some of those million ways – the ones grounded on social sciences, for example – and some of those more-supported-by-data ways, please?

> Remember Story POints (and the poker, etc.)
> were presented as "scientific" (just read Mike
> Cohn's books). But the data tells another story!

Planning Poker(TM) is based on Wideband Delphi, which dates back to Barry Boehm and the 70's. Simula Research Labs has done fairly recent research on Planning Poker, specifically. Alas, I couldn't take a look at that research easily enough (damn you academia – what's the point of doing research and hiding the results?) to check what exactly did Jorgensen conclude in his studies. The paraphrasing seems to suggest that group estimation improves estimation accuracy.

With that in mind, I'd like to clarify what is it about story points or planning poker that Mike Cohn suggests that you find is misrepresented as being scientifically proven while it isn't? In "Agile Estimating and Planning" Cohn references a total of 5 sources for his claims regarding "Why Planning Poker Works". Related to story points (relative estimation), he references 2 sources.

Now, I admit that I didn't even try to find those 7 research papers, but unless Cohn clearly misrepresents what those studies have concluded, I don't think it counts as a foul – he explains the rationale and provides references to back it. I'm not sure what else should've he done to not misrepresent the degree of scientificness of the techniques he describes in his book?
ReplyDelete
Replies
AriJanuary 28, 2012 at 4:57 PM
There are various reasons for estimating a backlog. One good one is getting a whole team to participate in planning the software. Now if I take a few steps back, isn't one of the biggest selling points for Agile predictability? Instead of pulling completion dates from a team's collective hat or - even worse - from a project managers head we measure empirically how a project is progressing.

In a basic Scrum scenario PBIs are estimated using complexity points or story points or whatnot. Then you split PBIs on-demand until they are small enough to fit into iterations. Then you measure number of points completed per iteration and voila! We have velocity that allows us to get predictability.

In a Kanban case or elsewhere where iterations don't exist you can just measure cycle times of individual items and then calculate velocity from that. If PBIs are sized very differently you want to estimate their relative sizes and factor that in the calculation. Or alternatively you must have PBIs that are close enough in size. But how do you get those without anyone ever estimating the relative sizes of your initial non-normalized PBL?

Does a method exist for normalizing PBIs or measuring velocity that does not require any kind of guesstimates from the team?

If you don't do any kind of estimates, how do you achieve predictability?

I would dare guess that is more often valuable than building a team, for contracts, commitments, timing marketing campaigns and other things outside a development team.
ReplyDelete
Replies

Add comment

Started! Working! Done!

Friday, January 27, 2012

The news of Story Points' demise is greatly exaggerated

7 comments:

Blog Archive

Popular Posts