The Deviance of Standard Deviation
Updated: May 24
Before getting too far into this post, there are two references that do a far better job than I ever will at explaining the deficiency of the standard deviation statistic:
"The Flaw of Averages" by Dr. Sam Savage (https://www.flawofaverages.com/)
Pretty much anything written by Dr. Donald Wheeler (spcpress.com)
Why is the standard deviation so popular? Because that's what students are taught. It's that simple. Not because it is correct. Not because it is applicable in all circumstances. It is just what everyone learns. Even if you haven't taken a formal statistics class, somewhere along the line, you were taught that when presented with a set of data, the first thing you do is calculate an average (arithmetic mean) and a standard deviation. Why were taught that? It turns out there's not a really good answer to that. An unsatisfactory answer, however, would involve the history of the normal distribution (Gaussian) and how over the past century or so, the Gaussian distribution has come to dominate statistical analysis (its applicability--or, rather, inapplicability--for this purpose would be a good topic for another blog, so please leave a comment letting us know your interest). To whet your appetite on that topic, please see Bernoulli's Fallacy by Aubrey Clayton.
Arithmetic means and standard deviations are what is known as descriptive statistics. An arithmetic mean describes the location of the center of a given dataset, while the standard deviation describes the data's dispersion. For example, say we are looking at Cycle Time data and we find that it has a mean of 12 and a standard deviation of 4.7. What does that really tell you? Well, actually, it tells you almost nothing--at least almost nothing that you really care about. The problem is that in our world, we are not concerned so much with describing our data as we are with doing proper analysis on it. Specifically, what we really care about is being able to identify possible process changes (signal) that may require action on our part. The standard deviation statistic is wholly unsuited to this pursuit. Why?
First and foremost, the nature of how the standard deviation statistic is calculated makes it very susceptible to extreme outliers. A classic joke I use all the time is: imagine that the world's richest person walks into a pub. The average wealth of everyone in the pub is somewhere in the billions, and the standard deviation of wealth in the pub is somewhere in the billions. However, you know that if you were to walk up to any other person in the pub, that person would not be a billionaire. So what have you really learned from those descriptive statistics?
This leads us to the second deficiency of the standard deviation statistic. Whenever you calculate a standard deviation, you are making a big assumption about your data (recall my earlier post about assumptions when applying theory?). Namely, you are making an assumption that all of your data has come from a single population. This assumption is not talked about much in statistical circles. According to Dr. Wheeler, "The descriptive statistics taught in introductory classes are appropriate summaries for homogeneous collections of data. But the real world has many ways of creating non-homogeneous data sets.." (https://spcpress.com/pdf/DJW377.pdf). In our pub example above, is it reasonable to assume that we are talking about a single population of peoples' wealth that shares the same characteristics? Or is it reasonable that some signal exists as evidence that one certain data point isn't routine?
Take the cliched example from the probability of pulling selecting marbles from an urn. The setup usually concerns a single urn that contains two different coloured marbles--say red and white--in a given ratio. Then some question is asked, like if you select a single marble, what is the probability it will be red? The problem is that in the "real world," your data is not generated by choosing different coloured marbles from an urn. Most likely, you don't know if you are selecting from one urn or several urns. You don't know if your urns contain red marbles, white marbles, blue marbles, bicycles, or tennis racquets. Your data is generated by a process where things can--and do--change, go wrong, encompass multiple systems, etc. It is generated by potentially different influences under different circumstances with different impacts. In those situations, you don't need a set of descriptive statistics that assume a single population. What you need to do is analysis on your data to find evidence of signal of multiple (or changing) populations. In Wheeler's nomenclature, what you need to do is first determine if your data is homogenous or not.
Now, here's where proponents of the standard deviation statistic will say that to find signal, all you do is take your arithmetic mean and start adding or subtracting standard deviations to it. For example, they will say that roughly 99.7% of all data should fall within your mean plus or minus 3 standard deviations. Thus, if you get a point outside of that, you have found signal. Putting aside for a minute the fact that this type of analysis ignores the assumptions I just outlined, this example brings into play yet another dangerous assumption of the standard deviation. When starting to couple percentages with a standard deviation (like 68.2%, 95.5%, 99.7%, etc.), you are making another big assumption that your data is normally distributed. I'm here to tell you that most real-world process data is NOT normally distributed.
So what's the alternative? As a good first approximation, a great place to start is with the percentile approach that we utilize with ActionableAgile Analytics (see, for example, this blog post). This approach makes no assumptions about single populations, underlying distributions, etc. If you want to be a little more statistically rigorous (which at some point you will want to be), then you will need the Process Behaviour Chart advocated by Dr. Donald Wheeler's continuation of Dr. Walter Shewhart's work. A deeper discussion of the Shewhart/Wheeler approach is a whole blog series on its own that, if you are lucky, may be coming to a blog site near you soon.
So, to sum up, the standard deviation statistic is an inadequate tool for data analysis because it:
Is easily influenced by outliers (which your data probably has)
Often assumes a normal distribution (which your data doesn't follow)
Assumes a single population (which your data likely doesn't possess)
Any analysis performed on top of these flaws is almost guaranteed to be invalid.
One last thing. Here's a quote from Atlassian's own website: "The standard deviation gives you an indication of the level of confidence that you can have in the data. For example, if there is a narrow blue band (low standard deviation), you can be confident that the cycle time of future issues will be close to the rolling average." There are so many things wrong with this statement that I don't even know where to begin. So please help me out by leaving some of your own comments about this on the 55 Degrees community site.
About Daniel Vacanti, Guest Writer
Daniel Vacanti is the author of the highly-praised books "When will it be done?" and "Actionable Agile Metrics for Predictability" and the original mind behind the ActionableAgile™️ Analytics Tool. Recently, he co-founded ProKanban.org, an inclusive community where everyone can learn about Professional Kanban, and he co-authored their Kanban Guide.
When he is not playing tennis in the Florida sunshine or whisky tasting in Scotland, Daniel can be found speaking on the international conference circuit, teaching classes, and creating amazing content for people like us.