Aptage Learner Math Notes

One of the weakest assumptions is that a well-run team will, over time, settle into a single velocity. I do not know the origin of this belief. Perhaps someone drew a false inference from controlled process concepts of Six Sigma. It is easy to see the difference between the Six Sigma and agile development models:

  1. Six Sigma theory refers to creating the same artifact, like a light bulb, or service, like delivering a package, over and over again. The goal of Six Sigma is to create a process with minimal variation of product and execution.
  2. Agile development involves dealing with varying work items.

Some might argue that all work items with three story points can be considered identical for process purposes. But the assignment of story points to a story can only be an informed guess, infused with uncertainty. This notion is addressed further in the Aptage Sprint Tester description.

Mathematically, this means that team velocity is not, and likely never will be, a single number, but is rather a probability distribution—which is what is returned with the Aptage Velocity Learner.

There are several techniques for finding given samples. If one has enough samples, the easiest method is to create a histogram from the sample set and then normalize the histograms so that the total probability comes to 1. The problem with this technique is that it requires hundreds of velocities to come up with the distribution, making the technique impractical.

Another technique is to assume that the velocity is described by one of the standard distribution families—normal, Weibull, exponential, and so on—and find the member of the family that best matches the data. The problem with this technique is that it will often “overfit” the data and give a poor answer with a small data set.

Aptage employs an algorithm that uses a combination of Bayes-based machine learning techniques and Monte Carlo simulation. It is less constrained than the curve fitting and incorporates very general assumptions, offering the best distribution possible given the small data sets.