Friday, August 17, 2012

My good friend and coffee-drinking companion Vasco Duarte posted another blog post on the virtues of not estimating in Agile projects some weeks ago — with data! which I think is both cool and necessary for this discussion. (Actually, it's only anecdotal evidence at this stage but a very good start. I hope people will start contributing data to the GDocs spreadsheet; I will if I only can!)

In summary, Vasco says that counting stories always leads to better predictions than summing estimates. This is not true without modifications. Off the top of my head, here are some of Vasco's assumptions:

  1. There is a significant overhead spent on estimating and maintaining estimates, and the overhead grows exponentially with the number of items (finding one specific item from a list of ten is MUCH faster than finding one from a list of 1000)
  2. The estimation activity does not include working on the acceptance criteria, APIs, architecture etc. 
  3. There are lots of stories (1000s per release)
  4. The stories are pretty small (on the order of hours)
  5. The team's estimates are worse than random — meaning that the team doesn't really know how to work with stories

Assumption #5 is in itself sufficient to void the "story counts are better than story points" controversy. Further, assumptions #1 and #2 may be mutually exclusive. And further, in his blog post Vasco uses data from a team where assumptions #3 and #4 are true, which indicates data bias.

Through Vasco there's some interesting data available now, and I'll try to make use of it and contribute to the information and knowledge we have. People seem to have so many opinions, but it's time to slam some data down on the table!

Thursday, April 26, 2012

On the importance of focus and feedback

Without focus and rapid feedback, IT projects are severely crippled. Through this post I'll prove (using Excel) that focus and rapid feedback can not only improve your project but shorten it dramatically as well.


For the sake of argument, I've assumed a project team that works over ten units of effort (the horizontal axis: you can think of this either as time units or as money units spent), consumes a cost of 10 monetary units and produces 20 monetary units worth of value. These numbers are pulled out of my hat and don't matter much: only the numbers on the scales will change but the graphs themselves will be identical. The ROI is of course value minus cost compared to cost: with these numbers the ROI is 100% which is fairly low as ROIs go but serves our illustrative purpose.

I will use this same team and project to draw up and project the ROI for four different approaches:

  1. A traditional plan-driven project delivering near the end
  2. An unfocused project with continuous delivery
  3. A focused project with an 80/20 Pareto distribution of value
  4. A focused project with an 80/50 Pareto distribution of value


First, let's consider the traditional plan-driven IT project. The team implements requirements in any old order (easiest first? most interesting first? software stack from bottom up?) and makes a 1.0 delivery quite late in the project followed by 1.0.1 and 1.1 deliveries. In this case, the return on investment looks like this from the customer's viewpoint:


Doesn't it look realistic? :-) Please note that this is a successful plan-driven IT project that actually delivers on budget! The costs are accumulating all the time, but the customer receives their first dose of value quite late. The last deliveries add some random functions and fix a number annoying bugs, but the ROI (the dotted line) doesn't increase much anymore but hovers around 100%.

From the start up until the first delivery, the customer doesn't really know what's happening. A lot of reports have dropped in, but no working code. The customer would not know if the project was in trouble.

Now consider the same team doing continuous delivery. They're still implementing requirements — errr, backlog items! — in any order that seems reasonable, but deliver to the customer on a weekly basis. From the customer's perspective this is a game-changer: after the initial product has been delivered, the value just racks up!


This does require some basic enablers such as a system for continuous integration and automated testing. Plus a strongly disciplined team that doesn't tolerate bugs.

However, while all requirements are of equal value, some just might be more equal than others. In fact, some researchers [1, 2] think that the value distribution should be a Pareto curve, also known as the "20/80 rule". In plain language this means that 20% of the work brings 80% of the value. If the team could somehow (hint hint: ask the customer!) determine which requirements are the most important, the situation would instead look like this:


What's happening here? Instead of going up linearly, the return on investment rises sharply before turning into a slow decline! Indeed, there seems to be a point of maximum ROI somewhere around 3 effort units, where the return on investment is almost 150%: way more than the projected 100%.

Now if you were the business owner and were looking at the ROI only, when would you terminate the project? Most likely at some point between 2 and 4 effort units, because the ROI curve is quite flat at the top and there's a quite large span of effort that would bring you almost the maximum ROI. So let it run to 4 E.U., then terminate. It really doesn't make economic sense to continue after that.

Is it possible to terminate the project at 4 E.U.? Yes of course. Since the team is delivering regularly on a weekly or perhaps even daily basis, the customer always has a working system and there are no technical objections to terminating the development project. Taking a leaf from Jeff Sutherland's "Money for Nothing and (Your) Change for Free", the contract could specify that the customer can terminate the project with a one-sprint notice period, by paying a certain percentage of the extant work.

This approach also requires discipline, but this time it's from the customer. The customer must work with the team early and often, maintaining and prioritizing the backlogs. The customer must also be prepared to "cut out" a large swatch of the initial "requirements". (It helps if you're the kind of person who sees a half-empty glass as half full.)

And here's a more moderate 50/80 Pareto curve, taking 50% of the effort to reach 80% of the value. In this case the maximum ROI is reached at around 6 effort units, but anything between 3 and 9 E.U. will bring in more than the projected original ROI. Or to put it another way, the supplier could double or triple their hourly fees and still meet the cost expectations of the customer.


So what's the point of this Excel exercise? As I stated at the very beginning, relentless focus and rapid feedback (in the form of continuous delivery) can be a game-changer.

These models are of course severely simplified, e.g. they don't account for the fact that it takes a while to whip up the first usable version of the product. But the point is still valid, I think. What do you think?


References:

[1] B. Boehm. Value-based software engineering: reinventing. SIGSOFT Softw. Eng. Notes, 28(2):3–, 2003. ISSN 0163-5948. doi: 10.1145/638750.638775.

[2] J. Bullock. Calculating the value of testing. Software Testing and Quality Engineering, pages 56–62, May/June 2000.

Wednesday, April 11, 2012

Systems theory

Kenneth Boulding (1956) generated a hierarchy of systems to support the General Systems Theory of Ludwig von Bertalanffy (1968). (Bertalanffy developed the GST from 1937 onwards.) Each level in the nine-level hierarchy includes the functionalities and attributes of all the lower levels.

The lowest level in the hierarchy is static, containing only labels and lists. The second level is comparable to clockwork, simple motions and machines, balances and counter-balances. The third level is cybernetic, self-controlling with feedback and information transmission. The fourth level is open, living, self-maintaining and self-reproducing. The fifth level is genetic, where labor is divided between differentiated, mutually dependent components that grow according to blueprints (e.g. DNA). The sixth level is animal, featuring self-awareness, mobility, specialized receptors and nervous systems. The seventh level is human, with self-consciousness and a sense of passing time. The eight level is social organization, with meanings and value systems. The ninth level is transcendental, metaphysical.

It's important to note that current natural science has not gone much beyond level four. Organizations are level eight. This means that there is a four-level gap between on one hand the organizations we wish to study, and on the other the scientific tools we have at our disposal.

Ludwig von Bertalanffy. General System theory: Foundations, Development, Applications. George Braziller, New York, 1968.
Kenneth Boulding. General Systems Theory. The Skeleton of Science. Management Science, 2, 3, pp.197-208, April, 1956. http://www.panarchy.org/boulding/systems.1956.html

Software and industrialism

Software development is post-industrial. It is so by definition: according to Alvin Toffler (1970) the computer and telecom industry have ignited a social revolution. Daniel Bell (1973) concurs: post-industrial society is organized around knowledge creation and the uses of information, activities that have been revolutionized by the computer. The post-industrial era is also called "the information era" by Bell and others.

Is it possible to use  to use industrial methods to manage post-industrial activities like software development?

Alvin Toffler. Future Shock. Random House, London, 1973.
Daniel Bell. The coming of post-industrial society. Basic Books, New York, 1970.

Sunday, March 25, 2012

Agility vs. legitimacy

Institutions and legitimacy require stability. Only something that is stable can ever become legitimate. Since agility means creating, embracing and learning from change [1], it may be that agility and legitimacy are mutually exclusive in the same organization.

Possible loophole: what if change can be constrained to one dimension only, e.g. the product ("what") dimension, leaving out the process ("how") and people ("who") dimensions? Would such an organization be more legitimate than one under constant change?

[1] Kieran Conboy. Agility from first principles: reconstructing the concept of agility in information systems development. Information Systems Research, 20(3): 329–354, September 2009. 10.1287/isre.1090.0236.

Friday, January 27, 2012

The news of Story Points' demise is greatly exaggerated

My friend Vasco Duarte just wrote an interesting piece on the subject of story points considered harmful. We've discussed this at length some years back (oh, the amounts of coffee consumed!) and I think his concept is good. However the article is fraught with errors and lacks a constructive conclusion.

By writing this blog post I intend to point out and correct some problems in the reasoning in Vasco's article. I don't have any special feelings for or against estimation. I don't think it's good if it becomes a cargo cult, something you just do because Scrum says so. That's true for most practices, by the way. But I believe estimation has a place as a "gateway drug" into collective software architecture and backlog maintenance. Once the team gets there, they can safely dump the story points.

Statistics

Let's start out by noting that statistical mathematics can explain many things. Small batches improve flow. More flow means more items flowing through the pipe, which means that the statistical variation of any attribute will be smaller. In the context of work queues and backlogs, a rate of perhaps 20-30 work items per week is enough to provide meaningful data based on counting work items (rather than summing up estimates).

The statistics presented by Vasco don't account for the fact that the estimate sum is dependent on the number of items. More items means a larger sum of estimates. To illustrate, I wrote a small script to output random data for a fictional team with 10 to 20 items completed every sprint and estimates ranging randomly from 1 to 13 using the Fibonacci scale. The correlation for this random data turns out to be around 0.73 and that is fully and totally the result of variable dependence. As mentioned above, increasing the number of items per sprint makes the statistical variation go down which increases the correlation between the two variables. At 70-130 items per sprint, the correlation is around 0.92. At 100-200 items it's 0.95.

Interestingly, three of Vasco's teams are actually doing worse than random. This is probably an argument against estimating. :-)

Complexity, predictability and chaos

Then let's move on to more human issues. Humans are unpredictable, but not as much as one would expect. The reason is that humans are social animals that can live and work together in groups for fun and profit. Indeed, we actually prefer to be part of a society or group. Hermits are rare and, bluntly put, a bit weird.

One of the mechanisms that allows us to live and work in groups is that we collectively construct social identities and institutions that constrain our actions. One such constraint could be: "In our team it's not OK to physically hurt people you disagree with". These identities and behaviors make us much more predictable. We can pretty much rely on Helen not to club down Tom in the meeting room with a chair during a heated technical debate, right?

If these identities don't exist or they are constructed wrong, there will be chaos. For example, the Vikings or Mongols of old would perhaps say that "in our society, physical power equals political power". In a conflict situation, Helen would club down Tom and everyone would laugh and nod and say what a nice chop that was, he hardly saw it coming. Next thing you know, Tom's brother Tim has set Helen's house on fire in the middle of the night and butchered everyone that tried to escape. But I digress.

Incomplete argumentation

Finally I'd like to point out that some of the counter-arguments to the six pro-estimation claims by Mike Cohn are weak. E.g. in claim 3 the article states "it can take days of work to estimate the initial backlog for a reasonable size project". It can (and often does) take days of work to construct an initial architecture for a reasonably sized project! You can't just start coding at random.

The blog post ignores the most important pro-estimation claim: the real benefit of estimating is that you get the whole team to participate in planning the software. This can be achieved in other ways, but most estimation techniques make it into a game that is fun to play. In the blog post I was looking for something to replace this activity.

Some other nitpicks:
  • The butterfly effect is commonly used to describe chaos theory, not complexity theory.
  • The complex environment is different from chaos in that it is causal but not predictable: in other words, causality is evident only when looking back but not when looking forward.
Conclusions

All in all, I think the key take-away of Vasco's post has a lot of merit, especially for seasoned Agile teams. However the article contains some serious leaps of faith, and the "myth of Story Points" is certainly not busted.


EDIT: Here's the Python script I used to create random data.

import random

for i in range(100):
  sumitems = 0
  sumestimates = 0
  for j in range(10 + random.randrange(10)):
    k = random.choice([1, 2, 3, 5, 8, 13])
    sumitems += 1
    sumestimates += k
  print sumitems, "\t", sumestimates


I then wimped out and did the correlation calculations in Excel.

Since random.randrange(10) actually results in a value between 0 and 9 inclusive, the script actually generates random data for sprints with 10 to 19 items, not 10 to 20 as specified. Doesn't invalidate the point though.