Variance and Standard Deviation: Karl Pearson's 1894 Yardstick of Scatter — and Why the Long-Term Investor Should Treat Volatility as the Price of Returns, Not the Measure of Risk

Afternoon Edition · Mental Models · No. 9

The model: one number for the scatter

Every average is an act of compression, and compression loses information. Tell a reader that two countries have the same average income, that two rivers have the same average depth, or that two portfolios have the same average return, and you have told them almost nothing about whether the two are alike to live in, to wade across, or to own. What the average discards is the scatter — how far the individual cases sit from the centre — and the history of statistics is in large part the history of learning to measure what the average throws away. The man who gave that measurement its modern name was Karl Pearson. In a lecture delivered at Gresham College on 31 January 1893 he proposed the term standard deviation as a replacement for the cumbersome expressions then in use — “root mean square error,” “error of mean square,” Gauss’s “mean error” — and the following year he put the symbol and the method to work in the first of his great memoirs, “Contributions to the Mathematical Theory of Evolution,” published in the Philosophical Transactions of the Royal Society (vol. 185, 1894, pp. 71–110). Pearson was building tools for biometry — he needed to describe how crab shells and human skulls varied around their averages — but the instrument he standardised turned out to be perfectly general. A quarter-century later R. A. Fisher, in his 1918 paper on Mendelian inheritance in the Transactions of the Royal Society of Edinburgh (vol. 52, pp. 399–433), gave the squared version its own name, the variance, and the modern vocabulary of dispersion was complete.

The definitions are compact. Take each observation’s distance from the mean, square it, and average the squares: that is the variance. Take the square root of the variance, so the answer is back in the original units: that is the standard deviation, written σ. The one-sentence form of the model is this: the standard deviation is the single number that says how far a typical observation sits from the average — and therefore how much, or how little, the average itself can be trusted as a description. A mean without a sigma is half a sentence. Two distributions can share an identical mean and describe entirely different worlds, and sigma is the number that tells them apart.

The mechanism: why squaring, and why it conquered finance

Why square the deviations rather than simply average their absolute size? Three properties made the squared measure the one that survived. First, squaring weights the large deviations more than the small ones, so sigma is sensitive to precisely the observations that make scatter dangerous. Second, the variance is additive: for independent quantities, the variance of a sum is the sum of the variances — a property the absolute deviation does not possess, and the algebraic hinge on which, as we shall see, the entire theory of diversification turns. Third, the variance plugs directly into the machinery of the normal distribution, the bell curve examined earlier in this series: when a quantity is normally distributed, roughly 68 per cent of observations fall within one sigma of the mean, 95 per cent within two, and 99.7 per cent within three. Even without normality there are weaker guarantees — Chebyshev’s inequality promises that at least three-quarters of any distribution lies within two sigmas — but it was the marriage with the bell curve that made sigma feel like a complete description of uncertainty. Therein, as we shall also see, lies the trap.

Finance adopted the model wholesale. In “Portfolio Selection” (Journal of Finance, vol. 7, no. 1, 1952, pp. 77–91), Harry Markowitz proposed that an investor cares about exactly two numbers — the expected return of a portfolio and its variance — and showed that because variances combine through covariance, a portfolio of imperfectly correlated assets has a lower sigma than the weighted average of its parts. That single algebraic fact is the only free lunch in investing, and it is a genuine one: diversification works because variance is additive and correlations are less than perfect. William Sharpe’s capital asset pricing model (1964) carried the programme further, decomposing each security’s variance into a market-related part and a residual. By the 1970s “risk” in the textbooks simply was sigma: risk-adjusted performance meant return per unit of standard deviation, and a security with a placid price chart was, by definition, a safe one. The measuring rod Pearson cut for crab shells had become the official definition of financial danger — not because anyone had demonstrated that volatility is what investors should fear, but because volatility is what the data made measurable. The number was available, so the number became the concept.

Two return distributions with the same eight per cent mean but different standard deviations, showing how sigma separates outcomes the average cannot distinguish — Figure 1. Same average, different worlds: two assets with an identical 8 per cent mean annual return. At σ = 6, almost every year lands between −4 and +20. At σ = 22, the same average admits years below −35 and above +50 — and it is the sigma, not the mean, that decides what the owner must be prepared to live through.

The empirical record: what sigma actually does in markets

The first empirical fact about equity-market sigma is that it is large. G. William Schwert’s study “Why Does Stock Market Volatility Change Over Time?” (Journal of Finance, vol. 44, no. 5, 1989, pp. 1115–1153) assembled monthly United States stock returns back to 1857 and found annualised volatility in the mid-teens as a long-run matter — a typical year’s return sits fifteen or so percentage points from the average year’s, which is why single-year results are nearly useless as evidence of skill. The second fact is that sigma is itself unstable. Schwert documented that volatility roughly doubled or tripled during recessions and banking panics, reaching its recorded extreme in the Great Depression: the calm of the mid-1920s and the convulsions of 1929–1933 belong to the same market but to utterly different volatility regimes. The third fact is that sigma’s changes are not random: Benoit Mandelbrot observed as early as 1963 (Journal of Business, vol. 36, no. 4) that large price changes tend to be followed by large changes, of either sign, and small by small — volatility arrives in clusters, like weather.

Markets eventually found sigma so central that they priced it directly. The Chicago Board Options Exchange introduced the VIX index in 1993 to distil, from option prices, the market’s own one-month forecast of S&P 500 volatility; it has traded in a remarkable range, from a closing low near 9 in the torpid autumn of 2017 to a closing peak near 83 in the panic of March 2020 — the market’s estimate of its own sigma varying ninefold within three years. And yet the most instructive empirical exhibit on the model is a piece of arithmetic performed on a boast. In August 2007, after Goldman Sachs’s flagship quantitative fund had lost 27 per cent of its value in weeks, the bank’s chief financial officer, David Viniar, told the Financial Times that “we were seeing things that were 25-standard deviation moves, several days in a row.” Four academics — Kevin Dowd, John Cotter, Chris Humphrey and Margaret Woods, in “How Unlucky is 25-Sigma?” (Nottingham University Business School, March 2008) — took the claim literally and computed what it would imply if the models were right: a single 25-sigma day should occur roughly once in 10¹³⁵ years, a span absurdly beyond the age of the universe — and several in a row, essentially never. The conclusion was not that the fund had been unlucky. It was that the sigma in the models was the wrong sigma: estimated from a calm window, applied to a fat-tailed world, and silently invalidated the moment the regime changed. Sigma measures the scatter you have seen; it makes no promise about the scatter you are yet to see.

Two episodes: when the yardstick failed its users

August 2007: the quant quake. In the second week of August 2007, long–short quantitative equity funds across Wall Street suffered simultaneous, violent losses on strategies that had been engineered, by the sigma arithmetic, to be nearly riskless. Andrew Lo and Amir Khandani’s post-mortem (“What Happened to the Quants in August 2007?”, MIT working paper, 2007) reconstructed the mechanics: dozens of funds held overlapping positions sized on recent, low, measured volatility; somewhere a large player unwound; prices moved against the common positions; risk systems calibrated to the old sigma demanded reductions; the reductions moved prices further. The measured scatter of the strategies had been low precisely because so much capital was holding them steady — and the same crowding guaranteed that when the scatter came, it would come all at once. This is the model’s great blind spot: sigma is estimated as if the world were a roulette wheel with fixed odds, but in markets the players’ own behaviour sets the odds, and a recorded period of calm can be the very mechanism that loads the next convulsion. Viniar’s 25-sigma days were not astronomical bad luck; they were the bill for treating an endogenous, regime-switching quantity as a physical constant.

5 February 2018: the day the short-volatility trade ate itself. By late 2017, sigma itself had become a harvestable crop. With the VIX scraping record lows, exchange-traded products that sold volatility — collecting a steady premium so long as calm persisted — had gathered billions; the most popular, an exchange-traded note designed to deliver the daily inverse of short-term VIX futures, had returned several hundred per cent over the preceding years and become a retail favourite. On Monday 5 February 2018 the VIX rose from 17 to 37 — its largest one-day proportional jump on record — and the inverse products were contractually obliged to buy volatility futures into the spike to rebalance, amplifying the very move that was destroying them. The flagship note lost more than nine-tenths of its indicative value after hours, triggered the acceleration clause in its prospectus, and was terminated weeks later; the episode acquired the name “Volmageddon” (see the CFA Institute Financial Analysts Journal summary, “Volmageddon and the Failure of Short Volatility Products,” 2021). The instrument had exhibited, until that afternoon, a gorgeous return history with modest measured scatter. Every sigma computed from that history was true, and every one of them was useless: the danger lay in a clause and a feedback loop that had simply never yet executed. Low realised volatility was not evidence of safety; it was the lure.

Timeline of volatility regimes from 2007 to 2020 showing the August 2007 quant quake, the 2017 calm, the February 2018 Volmageddon and the March 2020 panic — Figure 2. The sigma of sigma: market volatility is itself violently volatile. The same index of expected scatter that closed near 9 in November 2017 closed near 83 in March 2020 — and the two great failures of the yardstick, August 2007 and February 2018, both began in periods of recorded calm.

Application: three operating disciplines for the long-term investor

Discipline one: compute sigma for the business, and read price sigma as a toll schedule. The dispersion that determines a long-term owner’s outcome is the scatter of the business’s results — revenues, operating margins, owner earnings — across years and across conditions, because that is the scatter that compounds. The dispersion of the quotation is a different fact: it describes the moods of the auction, not the variability of the asset, and for an investor who is not leveraged and not selling, its principal meaning is the size of the paper drawdowns that must be endured en route. The discipline is to compute both and to refuse to let one impersonate the other. An enterprise with wildly scattered quotations may have placid economics; an enterprise with a placid chart may, like the 2018 volatility notes, carry a clause that has never yet executed. Before owning anything, write down two numbers: the worst plausible year for the business, and the worst plausible year for the quote. The first tells you whether to own it; the second, how it will feel.

Discipline two: use the algebra where it is valid, and stress-test where it is not. The variance arithmetic is one of the few places in investing where mathematics delivers something for nothing: covariances below one mean a portfolio’s scatter is less than its parts’, which is the entire intellectual content of diversification, and it is the reason holding eight to twenty businesses with different economics is not merely convention but theorem. Position sizing, likewise, is variance work: a holding’s weight should reflect how much of the portfolio’s total scatter it contributes, not merely how attractive it looks. But the same investor must mark the boundary of the algebra’s validity. Equity returns are fatter-tailed than the bell curve, sigma shifts regimes without notice, and correlations rise toward one exactly when diversification is needed most — so the prudent rule is to budget for history’s worst observed episodes rather than for two-sigma textbook years, and above all to hold no leverage that converts a survivable three-sigma drawdown into an unsurvivable margin call. Sigma misjudged is an inconvenience for the unleveraged and an obituary for the leveraged; the 1929–1933 contraction Schwert measured, and the 2007 and 2018 episodes above, were survivable precisely in proportion to the absence of borrowed money.

Discipline three: treat volatility as the price of admission, and arrange never to be a forced seller. If quotation sigma is the toll the market charges for the equity premium, the long-term investor’s edge is the structural ability to pay it: capital that is not borrowed, not needed on a deadline, and not answerable to a redemption window. From that position, the market’s scatter changes sign — the two-sigma year stops being the risk and becomes the opportunity set, since the same dispersion that frightens the forced seller hands the unforced buyer his occasional wide-spread prices. The operating rules are mundane and powerful: write the buying and rebalancing rules in calm weather so the volatile day executes a plan rather than an emotion; keep the cash and the temperament to act in the fat left tail; and measure your own results over full cycles, not over the single noisy years that sigma tells you in advance will be unrepresentative.

Two-column ledger contrasting what the standard deviation measures with what it misses, over a footer stating that volatility is the price of returns — Figure 3. The ledger of the yardstick: what sigma sees, and what it is blind to. The number is indispensable on the left side of the page and dangerous whenever it is mistaken for the right.

How the long-term tradition has used it

The great practitioners of long-term equity investing arrived, decades apart and by different routes, at the same amendment to the textbook: sigma is a real number measuring a real thing, but the thing it measures is not risk. Warren Buffett devoted a section of his fiftieth-anniversary letter to Berkshire Hathaway shareholders (2014) to the distinction. Over the half-century 1964–2014, he noted, the S&P 500 with dividends reinvested returned 11,196 per cent while the purchasing power of the dollar fell 87 per cent — so the asset class with the violent quotation history had been, for any owner with a horizon, dramatically safer than the “stable” currency-denominated alternatives, and formulas that equate volatility with risk, he wrote, lead students, investors and chief executives astray. Volatility was the price; the loss of purchasing power was the risk. Howard Marks made the same argument from the inside of the institutional world in his memo “Risk Revisited” (Oaktree Capital, September 2014) and at chapter length in The Most Important Thing (Columbia University Press, 2011): academia reached for volatility, he argues, largely because volatility was the one candidate that could be computed, whereas the thing investors actually fear — the probability of permanent capital loss — refuses to be put into a single number, and the substitution of the measurable for the meaningful is exactly the error Pearson’s tidy symbol invites. Seth Klarman had drawn the practitioner’s conclusion bluntly in Margin of Safety (1991): a security’s past price wiggles say little about the prospective loss in the asset, which lives in the gap between price and value, in leverage, and in the owner’s own constraints. None of the three discards the model. All three demote it — from the definition of risk to a description of weather.

Key takeaways

The model in one sentence. The standard deviation (Pearson, 1893–94) is the single number that says how far a typical observation sits from the average — the measure of how much an average can be trusted; the variance (Fisher, 1918) is its square, and the additivity of variances is the algebra on which diversification rests (Markowitz, 1952).
Equity sigma is large, unstable and clustered. Long-run United States equity volatility runs in the mid-teens annually, doubles or triples in panics (Schwert, 1989), and arrives in clusters (Mandelbrot, 1963) — so single years prove almost nothing, and yesterday’s sigma is not a constant of nature.
Sigma fails exactly when it matters. A “25-sigma day” should occur about once in 10¹³⁵ years (Dowd et al., 2008); when Goldman’s funds reported several in a row in 2007, and when the short-volatility notes died in a day in 2018, the lesson was the same — calm windows understate fat-tailed, regime-switching, crowd-driven scatter.
Measure the business’s scatter, not just the ticker’s. Quotation sigma is the toll schedule of the auction; business sigma is what compounds. The long-term investor computes both, budgets for history’s worst rather than two textbook sigmas, and holds no leverage that turns a drawdown into an ending.
Volatility is the price of returns, not their risk. Over 1964–2014 the violently volatile asset class beat the stable-looking one by four orders of magnitude in purchasing-power terms (Buffett, 2014); risk is the probability of permanent loss (Marks, 2011, 2014) — and the investor structured never to be a forced seller is the one for whom sigma changes from a danger into a discount schedule.

— Manish Goel, FCA / NorthPath Advisory OÜ / Tallinn, Estonia

Important.
All content on this site and in this email is journalism and education for a general audience. Nothing here constitutes investment advice or a recommendation in respect of any specific financial instrument, nor an offer or solicitation to buy or sell any security. Readers should consult an authorised financial adviser regulated in their own jurisdiction before making any investment decision.

Variance and Standard Deviation: Karl Pearson’s 1894 Yardstick of Scatter — and Why the Long-Term Investor Should Treat Volatility as the Price of Returns, Not the Measure of Risk

The model: one number for the scatter

The mechanism: why squaring, and why it conquered finance

The empirical record: what sigma actually does in markets

Two episodes: when the yardstick failed its users

Application: three operating disciplines for the long-term investor

How the long-term tradition has used it

Key takeaways

More posts

Markov Chains: Why the Future Depends Only on the Present

The Wrong Question: Attribute Substitution and the Good-Company Trap

The Monte Carlo Method: Stanisław Ulam’s 1946 Game of Solitaire, the Engine of Chance That Modelled the Bomb, and Why the Long-Term Investor Should Think in Distributions, Not Forecasts

The Second Marshmallow: Walter Mischel’s 1968 Test of Self-Control, and Why the Market Pays Its Largest Rewards to the Long-Term Investor Who Can Wait