Friday, January 27, 2012

The news of Story Points' demise is greatly exaggerated

My friend Vasco Duarte just wrote an interesting piece on the subject of story points considered harmful. We've discussed this at length some years back (oh, the amounts of coffee consumed!) and I think his concept is good. However the article is fraught with errors and lacks a constructive conclusion.

By writing this blog post I intend to point out and correct some problems in the reasoning in Vasco's article. I don't have any special feelings for or against estimation. I don't think it's good if it becomes a cargo cult, something you just do because Scrum says so. That's true for most practices, by the way. But I believe estimation has a place as a "gateway drug" into collective software architecture and backlog maintenance. Once the team gets there, they can safely dump the story points.

Statistics

Let's start out by noting that statistical mathematics can explain many things. Small batches improve flow. More flow means more items flowing through the pipe, which means that the statistical variation of any attribute will be smaller. In the context of work queues and backlogs, a rate of perhaps 20-30 work items per week is enough to provide meaningful data based on counting work items (rather than summing up estimates).

The statistics presented by Vasco don't account for the fact that the estimate sum is dependent on the number of items. More items means a larger sum of estimates. To illustrate, I wrote a small script to output random data for a fictional team with 10 to 20 items completed every sprint and estimates ranging randomly from 1 to 13 using the Fibonacci scale. The correlation for this random data turns out to be around 0.73 and that is fully and totally the result of variable dependence. As mentioned above, increasing the number of items per sprint makes the statistical variation go down which increases the correlation between the two variables. At 70-130 items per sprint, the correlation is around 0.92. At 100-200 items it's 0.95.

Interestingly, three of Vasco's teams are actually doing worse than random. This is probably an argument against estimating. :-)

Complexity, predictability and chaos

Then let's move on to more human issues. Humans are unpredictable, but not as much as one would expect. The reason is that humans are social animals that can live and work together in groups for fun and profit. Indeed, we actually prefer to be part of a society or group. Hermits are rare and, bluntly put, a bit weird.

One of the mechanisms that allows us to live and work in groups is that we collectively construct social identities and institutions that constrain our actions. One such constraint could be: "In our team it's not OK to physically hurt people you disagree with". These identities and behaviors make us much more predictable. We can pretty much rely on Helen not to club down Tom in the meeting room with a chair during a heated technical debate, right?

If these identities don't exist or they are constructed wrong, there will be chaos. For example, the Vikings or Mongols of old would perhaps say that "in our society, physical power equals political power". In a conflict situation, Helen would club down Tom and everyone would laugh and nod and say what a nice chop that was, he hardly saw it coming. Next thing you know, Tom's brother Tim has set Helen's house on fire in the middle of the night and butchered everyone that tried to escape. But I digress.

Incomplete argumentation

Finally I'd like to point out that some of the counter-arguments to the six pro-estimation claims by Mike Cohn are weak. E.g. in claim 3 the article states "it can take days of work to estimate the initial backlog for a reasonable size project". It can (and often does) take days of work to construct an initial architecture for a reasonably sized project! You can't just start coding at random.

The blog post ignores the most important pro-estimation claim: the real benefit of estimating is that you get the whole team to participate in planning the software. This can be achieved in other ways, but most estimation techniques make it into a game that is fun to play. In the blog post I was looking for something to replace this activity.

Some other nitpicks:
  • The butterfly effect is commonly used to describe chaos theory, not complexity theory.
  • The complex environment is different from chaos in that it is causal but not predictable: in other words, causality is evident only when looking back but not when looking forward.
Conclusions

All in all, I think the key take-away of Vasco's post has a lot of merit, especially for seasoned Agile teams. However the article contains some serious leaps of faith, and the "myth of Story Points" is certainly not busted.


EDIT: Here's the Python script I used to create random data.

import random

for i in range(100):
  sumitems = 0
  sumestimates = 0
  for j in range(10 + random.randrange(10)):
    k = random.choice([1, 2, 3, 5, 8, 13])
    sumitems += 1
    sumestimates += k
  print sumitems, "\t", sumestimates


I then wimped out and did the correlation calculations in Excel.

Since random.randrange(10) actually results in a value between 0 and 9 inclusive, the script actually generates random data for sprints with 10 to 19 items, not 10 to 20 as specified. Doesn't invalidate the point though.

7 comments:

  1. When you say "It can (and often does) take days of work to construct an initial architecture for a reasonably sized project! You can't just start coding at random." You forget to add that you *CAN* work without estimating, but not without a (even implicit) architecture. So that argument isn't holding up :)

    When you say "The blog post ignores the most important pro-estimation claim: the real benefit of estimating is that you get the whole team to participate in planning the software."
    You don't mention any data to prove your point. The fact is that you have no data, because what you need (people talking to each other) can be done in a *MILLION* ways, including some ways that are grounded on social sciences. Estimation is NOT grounded on social sciences, also there's no evidence that Estimation gets people to talk to each other. Talk to @josephpelrine on that.
    My point: there are much better, more engaging, more structured, more supported by data ways of getting people to get people to participate in the planning! :)

    "The butterfly effect is commonly used to describe chaos theory, not complexity theory."
    TGhis is true, but I was making a point about *causality* not just chaos/complexity. That was mentioned in the scope of Complexity Sciences.

    PLease do continue your line of argument. Your last phrase (myth not busted) is just speculation! You offer no data that supports that conclusion... PLease do write about this, I'd be pleased and honored to have my hypothesis refuted!

    Remember Story POints (and the poker, etc.) were presented as "scientific" (just read Mike Cohn's books). But the data tells another story! Let's get the argument back to the data-backed statements and hypothesis. Not more speculation. PLease.... :)

    ReplyDelete
    Replies
    1. Good comments! However don't forget that understanding what you're creating (having an architecture) is a prerequisite for estimating. Estimation methods like planning poker drive collaborative architecture and software design. A vast majority of the effort spent on "estimating" is in fact spent on architecting and designing. The estimates are almost a by-product and the effort spent on recording and tracking estimates is small compared to the architecture work. Saying "it can take days to estimate" is the same as saying "it can take days to design".

      The real benefit is indeed that the whole team participates in designing the software. No recipe is perfect and it's entirely possible to play planning poker wrong. And there are people - autistics and introverts - who really do not enjoy collaborating with others.

      About your data, the fact that seven teams did better than random is a good point FOR story points and estimating. :-)

      Delete
    2. First: Estimation does *not* lead people to collaborate. That's purely speculation. In fact architecture does not get developed when estimating stories *individually*. Architecture gets developed and discussed when we are talking about *architecture*.
      My experience is that trying to use estimation as a proxy for a design/architecture discussion discourages people from focusing on architecture. So, my experience tells me that Estimation is BAD for architecture and common agreement.

      Second. The fact that some teams had a higher correlation between SP's and # of stories is an argument to NOT USE story points. The higher the correlation the less information SP's *could* (speculation) add. In fact it may be that SP's are
      adding information that detracts from the accuracy of estimates -- that hypothesis is not proven yet but it is not yet dis-proved either ;)

      Delete
  2. > You don't mention any data to prove your point.
    > The fact is that you have no data, because what
    > you need (people talking to each other) can be
    > done in a *MILLION* ways, including some ways
    > that are grounded on social sciences. [...]
    > My point: there are much better, more engaging,
    > more structured, more supported by data ways
    > of getting people to get people to participate in
    > the planning! :)

    Care to name some of those million ways – the ones grounded on social sciences, for example – and some of those more-supported-by-data ways, please?


    > Remember Story POints (and the poker, etc.)
    > were presented as "scientific" (just read Mike
    > Cohn's books). But the data tells another story!

    Planning Poker(TM) is based on Wideband Delphi, which dates back to Barry Boehm and the 70's. Simula Research Labs has done fairly recent research on Planning Poker, specifically. Alas, I couldn't take a look at that research easily enough (damn you academia – what's the point of doing research and hiding the results?) to check what exactly did Jorgensen conclude in his studies. The paraphrasing seems to suggest that group estimation improves estimation accuracy.

    With that in mind, I'd like to clarify what is it about story points or planning poker that Mike Cohn suggests that you find is misrepresented as being scientifically proven while it isn't? In "Agile Estimating and Planning" Cohn references a total of 5 sources for his claims regarding "Why Planning Poker Works". Related to story points (relative estimation), he references 2 sources.

    Now, I admit that I didn't even try to find those 7 research papers, but unless Cohn clearly misrepresents what those studies have concluded, I don't think it counts as a foul – he explains the rationale and provides references to back it. I'm not sure what else should've he done to not misrepresent the degree of scientificness of the techniques he describes in his book?

    ReplyDelete
    Replies
    1. On your first point: Any book on facilitation or about "effective meetings" has a bunch of techniques that are based on psychology to get people to either "agree" or "brainstorm" or "negotiate" an architecture. Estimation is never *ever* in those books ;)

      On your second point: There's no evidence that "more accuracy" with Poker is any better than just counting the number of stories. That's an area where we need more data.

      Regarding Cohn's "references": in his book User Stories Applied he has exactly 1 reference listed and that is to an email by Joshua Kerievsky. In the text he mentions "wide band delphi" in passing but without any explanation whatsoever as to "how" his approach is based on wide band delphi. (User Stories Applied, chapter 8)

      Further, Mike himself confuses story points with "days" (page 90, first edition). This leads me to believe that Story Points and the whole "group estimation through poker" was introduced as a mere marketing technique with *no* base on any data comparing SP's with any other estimation practice.

      Also, let's try to make the problem clear. I'm not saying that SP's are worst than *any* other estimation practice. I'm saying that you don't *need* estimation. What you need is some way to collect data and use that data to project future performance. -- and even that is not perfect ;) (although at least as good as SP's as my data suggests)

      Delete
  3. There are various reasons for estimating a backlog. One good one is getting a whole team to participate in planning the software. Now if I take a few steps back, isn't one of the biggest selling points for Agile predictability? Instead of pulling completion dates from a team's collective hat or - even worse - from a project managers head we measure empirically how a project is progressing.

    In a basic Scrum scenario PBIs are estimated using complexity points or story points or whatnot. Then you split PBIs on-demand until they are small enough to fit into iterations. Then you measure number of points completed per iteration and voila! We have velocity that allows us to get predictability.

    In a Kanban case or elsewhere where iterations don't exist you can just measure cycle times of individual items and then calculate velocity from that. If PBIs are sized very differently you want to estimate their relative sizes and factor that in the calculation. Or alternatively you must have PBIs that are close enough in size. But how do you get those without anyone ever estimating the relative sizes of your initial non-normalized PBL?

    Does a method exist for normalizing PBIs or measuring velocity that does not require any kind of guesstimates from the team?

    If you don't do any kind of estimates, how do you achieve predictability?

    I would dare guess that is more often valuable than building a team, for contracts, commitments, timing marketing campaigns and other things outside a development team.

    ReplyDelete
    Replies
    1. The point of using # of stories done as a metric for future projection is that we don't *need* to take any schedule from anybody's hat ;) Story Points are the closest you get to "hat estimation" in the agile literature ;)

      Story points do not give any *better* prediction than counting the # of stories. At least that's what the data suggests. The fact you *believe* SP's give you a better predictability does not make it so. Look at the data you already have and ask that question. Data is your friend.

      Regarding your comment on Kanban: Kanban teams measure cycle time - that is an average value that is the closest you can get to # of items. In fact, if you know the number of stories completed on average you can *infer* your cycle time. So, don't use that as an argument for SPs.
      Also, kanban teams only argue about the Minimum Viable Feature/Product, nothing else. They don't estimate the size of items that are way into the future. In fact, the backlash *against* estimation comes from the Kanban community ;)

      Estimates do *not* give you predictability at all. We have the last 30 years of software industry to prove that :)

      However, when you are planning your next sprint or your next MVF you *do* need some form of estimation (I don't call it that because it is about *now* not future). That form of estimation can be as simple as asking the team "can we do this feature/story in one sprint?" -- this is a much more valuable question than asking the number of story points for many reasons - check the blog post: http://bit.ly/AmVhS6

      Delete