The Normal Distribution: Abraham de Moivre's 1733 Bell Curve, Gauss's Law of Error, and Why the Long-Term Equity Investor Must Never Mistake the Middle for the Tail

Afternoon Edition — Mental Models · Essay No. 11 · 13 June 2026 · Tallinn

In November 1733, Abraham de Moivre — a Huguenot refugee who had fled France after the revocation of the Edict of Nantes, and who made his living in the coffee houses of London computing odds for gamblers and annuity prices for underwriters — privately printed seven pages of Latin and showed them to a handful of friends. The problem he had solved looked tedious. If a fair coin is tossed a great many times, what is the chance of obtaining a given number of heads? The binomial answer was exact but useless: a tower of factorials no one could evaluate by hand once the number of tosses grew large. De Moivre discovered that as the tosses multiply, the jagged binomial settles into a single smooth shape — symmetric, bell-formed, and fixed entirely by its centre and its spread. He had drawn, for the first time, what would become the most useful and the most over-trusted curve in the history of quantitative thought. Three centuries later it remains both.

1. The model

A quantity is normally distributed when its probabilities cluster symmetrically around a single mean and fall away on either side at a rate set by one parameter, the standard deviation — the curve whose height is proportional to e raised to minus one-half the squared distance from the centre. De Moivre derived it as the limiting form of the binomial in his 1733 pamphlet, Approximatio ad Summam Terminorum Binomii (a+b)ⁿ in Seriem expansi, and folded the result into the second edition of The Doctrine of Chances (London, 1738), even identifying the scaling — that spread grows with the square root of the number of trials — that still governs every standard error computed today. But the curve carries another man’s name. Carl Friedrich Gauss, working on the orbits of asteroids in Theoria Motus Corporum Coelestium (Hamburg, 1809), showed that if the errors of astronomical observation followed this law, then the arithmetic mean was the most probable true value and the method of least squares was justified. The curve became the “law of error,” and the adjective “Gaussian” stuck.

Pierre-Simon Laplace supplied the deep reason the curve appears at all. In his 1810 memoir and his Théorie analytique des probabilités (Paris, 1812), he proved the first general version of what is now the central limit theorem — that the sum of many independent influences tends toward the normal curve regardless of the shape of the individual parts. From there the curve escaped the observatory. Adolphe Quetelet, in Sur l’homme et le développement de ses facultés (Paris, 1835), applied it to the chests of Scottish soldiers and the heights of French conscripts and announced l’homme moyen, the average man, at the centre of a bell of human variation. Francis Galton built a mechanical board of falling shot — the quincunx — to show the curve assembling itself, and in Natural Inheritance (Macmillan, 1889) praised the “Law of Frequency of Error” as a form of “cosmic order” reigning amidst the wildest confusion. It was Karl Pearson who fixed the name “normal” in the 1890s, and Pearson who later regretted it, noting that the word wrongly implies every other distribution is abnormal.

The one-sentence form: the normal distribution is the thin-tailed bell, fixed by its mean and its standard deviation alone, toward which sums of many small independent shocks converge — which makes it an excellent description of the calm middle of things and a dangerous description of their extremes.

Two probability-density curves sharing the same centre and spread; the Gaussian has thin tails while the fat-tailed curve carries far more probability in the extremes — Figure 1. The same middle, a different edge: two distributions can share a centre and a spread yet differ entirely in the tail — and it is the tail the bell calls impossible that decides the investor’s fate.

2. The mechanism

Why should one curve describe the errors of a telescope, the heights of conscripts, and the scatter of shot on Galton’s board? The answer is the central limit theorem, examined in its own right earlier in this series. When an outcome is the additive sum of many small, independent, individually unimportant causes — a hundred genetic and nutritional nudges to a man’s height, a thousand tiny imperfections in a measurement — the particular shapes of the causes wash out and only the bell survives. The distribution that results is fully described by two numbers, its mean and its variance, and it is the maximum-entropy distribution for those two numbers: among all the curves with a given centre and a given spread, the normal is the one that assumes the least beyond them. This is the source of both its honesty and its seduction. It is the most non-committal thing one can say about a quantity once its average and its wobble are known.

The price of that economy is in the tails. Because the curve’s height falls as the exponential of the negative squared distance from the centre, the probability of a large deviation does not merely shrink — it collapses, faster and faster, as one moves out. A move of two standard deviations is an ordinary inconvenience; a move of five is a once-in-fourteen-thousand-years event for a daily series; a move of seven would not be expected once in the age of the universe. The Gaussian world is one in which nothing very surprising ever happens, because surprise has been defined out of existence by the shape of the curve. That is precisely correct for the systems the curve was built to describe, where causes are genuinely additive and bounded, and precisely wrong for systems where one cause can feed another.

The discipline of this series is to state what a model does not claim, and the normal distribution does not claim to describe everything. Where outcomes are the aggregation of many small, independent, bounded influences, the Gaussian tools work exactly as advertised: the standard error of a sample mean, the dispersion of measurement noise, the smoothing of many uncorrelated idiosyncratic bets into a diversified whole. The mental model is therefore a sorting question to ask before any bell-curve tool is lifted from the drawer. Is this quantity built by addition, or by reinforcement? Heights are added; fortunes compound. Polling errors are added; market panics feed on themselves. The investor who asks the sorting question first will use the curve where it belongs — and will notice that the quantities that decide a long-term equity record almost all sit on the side of the line where it does not.

3. The empirical record

The application of the normal curve to financial prices has an exact birthday. In 1900, in a Paris doctoral thesis supervised by Henri Poincaré, Louis Bachelier modelled the fluctuations of French government bonds as the accumulation of independent random increments — Brownian motion, five years before Einstein used the same mathematics for pollen — and so made price changes Gaussian by construction (Théorie de la spéculation, Annales scientifiques de l’École Normale Supérieure, vol. 17, 1900, pp. 21–86). The assumption was beautiful, tractable, and wrong in exactly the place that matters. Benoit Mandelbrot, examining the day-to-day changes in cotton prices, found tails far too heavy for any bell and concluded that large moves were not the freak exceptions a Gaussian would require but a structural feature of the series (“The Variation of Certain Speculative Prices,” Journal of Business, vol. 36, 1963, pp. 394–419). Eugene Fama, then a young researcher, confirmed the leptokurtosis directly in American stocks and warned that the normal model understated the frequency of extremes (“The Behavior of Stock-Market Prices,” Journal of Business, vol. 38, 1965, pp. 34–105).

Six decades of data have only sharpened the verdict. The distribution of daily equity-index returns carries a kurtosis many times the Gaussian value of three, which is the technical way of saying that the rare, violent day arrives orders of magnitude more often than the bell predicts. The point was put most memorably not by a critic but by a practitioner defending his models. In August 2007, as Goldman Sachs’s quantitative funds bled, its chief financial officer David Viniar told the Financial Times that the firm had been seeing “25-standard-deviation moves, several days in a row.” Under a normal distribution a single twenty-five-sigma day has a probability so small that one would not expect to see it if markets had been trading since the beginning of the universe, many times over; to see several in a week is not bad luck but a confession that the curve generating the sigmas was the wrong one. The regulators eventually agreed in the only language they speak. After the bell-curve value-at-risk models of the banking system failed wholesale in 2008, the Basel Committee bolted on a “stressed VaR” charge in 2009 and a fundamental review of the trading book thereafter — an institutional admission that a single Gaussian number could not be trusted to measure the danger in the tail.

Bar chart on a log scale showing how rarely the normal distribution says a daily move of three to seven standard deviations should occur, with the 1987 crash and a 25-sigma week noted as off the chart — Figure 2. What the bell calls impossible: how often the normal curve says a one-day move of n standard deviations should occur — with October 1987 and a CFO’s 25-sigma week sitting far off the chart.

4. Two episodes

Black Monday, 19 October 1987. On a single trading day, with no war, no default, and no recession announcement, the Dow Jones Industrial Average fell 22.6 per cent and the S&P 500 fell 20.5 per cent. Under the Gaussian assumptions then standard in portfolio theory — daily volatility in the region of one per cent — a twenty-standard-deviation day is not improbable; it is impossible, excluded by the shape of the curve. Jackwerth and Rubinstein later computed that, under the lognormal benchmark used to price options before the crash, the fall in two-month S&P 500 futures was a twenty-seven standard-deviation event with a probability of order ten to the power of minus one hundred and sixty (“Recovering Probability Distributions from Option Prices,” Journal of Finance, vol. 51, 1996, pp. 1611–1631). A number that small is not a probability; it is a verdict on the model that produced it. The crash was amplified by portfolio insurance, a strategy built on the assumption of continuous, roughly Gaussian markets that sold mechanically into the fall — the model’s failure and the model’s product arriving on the same afternoon.

The Gaussian copula, 2000–2008. The second episode shows the curve migrating from the measurement of risk to its manufacture. In 2000 the actuary David X. Li published “On Default Correlation: A Copula Function Approach” (Journal of Fixed Income, vol. 9, no. 4, 2000, pp. 43–54), which used a Gaussian copula — a normal dependence structure — to model how the defaults of many borrowers move together. The formula let banks compress the messy, unknowable joint behaviour of thousands of mortgages into a single correlation number and so price collateralised debt obligations at industrial scale; the journalist Felix Salmon would later call it, in Wired, “the formula that killed Wall Street” (23 February 2009). Its Gaussian heart assumed that extreme joint defaults were vanishingly unlikely, exactly the thin-tail error of Section 2, and when American house prices fell together in a way the normal dependence structure deemed near-impossible, the triple-A tranches it had blessed defaulted in waves. The earlier collapse of Long-Term Capital Management in 1998, whose near-Gaussian risk system had treated correlated convergence trades as independent until Russia’s default proved otherwise, was the same mistake in miniature, a decade before, and unlearned (Roger Lowenstein, When Genius Failed, Random House, 2000).

Timeline from de Moivre 1733 and Gauss 1809 through Quetelet, Bachelier, Mandelbrot, Black Monday 1987 and the 2008 Gaussian copula — Figure 3. The bell’s long career: from a coin-tossing approximation in 1733 to the Gaussian formula at the centre of the 2008 crisis.

5. Application to long-term equity investing

The normal distribution enters the investor’s life as a hidden assumption inside almost every tool of modern finance — the beta on a brokerage screen, the volatility in an option price, the value-at-risk on a risk report — and three operating disciplines follow from knowing where the assumption breaks.

First: trust the curve for the middle and never for the tail. The bell describes the ordinary year tolerably well and the catastrophic week not at all, so a risk framework calibrated to recent standard deviations will, over a long investing life, meet several days it classified as impossible. The disciplines that survive such days are old-fashioned and non-statistical: leverage low enough that a one-day gap cannot force a sale, obligations matched so that nothing must be liquidated into the fall, and a margin of safety in the purchase price that does the work the volatility model only pretends to do. Position sizing should be set by what a fat-tailed world can deliver, not by the comfortable sigma the Gaussian reports.

Second: refuse to equate volatility with risk. Standard deviation is a parameter of the normal curve — a measure of how much a price wobbles in the calm middle — and the long-term owner of a business is exposed to something the curve cannot see: the permanent loss of capital that lives in the tail and in the balance sheet, not in the daily quote. This letter has argued the point at length in its essay on variance and standard deviation, and it bears repeating because the confusion is institutionalised. A volatile holding that compounds is not risky for the patient owner; a placid one heading quietly toward insolvency is, and no Gaussian dispersion measure will warn of it in time.

Third: ask the sorting question before reaching for the bell. Use the normal curve where addition genuinely rules — in the diversification of many small, uncorrelated, idiosyncratic risks, where the central limit theorem is the investor’s friend and the law of large numbers does its work. Distrust it precisely where correlation and contagion turn independent risks into one risk: in leverage, in crowded trades, in the assumption that what has never moved together never will. The 1987 crash, the LTCM collapse, and the 2008 copula meltdown differ in every detail except the one that matters — in each, the fatal step was treating a reinforcing, fat-tailed system as if it were the additive, thin-tailed world the bell curve was built for.

6. How the long-term equity tradition has used it

The investors this letter studies have been the curve’s most consistent critics, precisely because they think in decades rather than in days. Warren Buffett devoted part of his 2008 letter to shareholders — written as the Gaussian risk models of the banking system were failing in real time — to the mathematicians who had built them, observing that “constructed by a nerdy-sounding priesthood using esoteric terms such as beta, gamma, sigma and the like,” history-based models look impressive until they are needed, and closing with the warning that has since become a proverb: “Beware of geeks bearing formulas” (Berkshire Hathaway shareholder letter, 2008). Fifteen years earlier, in his 1993 letter, Buffett had already rejected the bell curve’s central financial application — the use of beta and price volatility as the definition of risk — insisting that risk is the probability of permanent loss, a thing no standard deviation can measure.

Howard Marks built an entire investment philosophy on the same distinction. In The Most Important Thing (Columbia University Press, 2011) and across his Oaktree memos, Marks argues that risk is the likelihood of losing money permanently, not the volatility that academics adopted because it was the only thing they could measure; the most dangerous risks, he writes, are the ones that materialise rarely and lie outside any recent distribution. Nassim Taleb made the attack general in The Black Swan (Random House, 2007), naming the bell curve a “great intellectual fraud” when transplanted from the world of additive quantities — his Mediocristan, where the curve rules honestly — to the world of scalable, reinforcing ones, his Extremistan, where markets, wealth, and book sales actually live. And behind all of them stands Benjamin Graham, whose margin of safety was, decades before the mathematics was formalised, a purely practical defence against everything the model leaves out: buy at a price low enough that being wrong about the future, including a future the bell curve says cannot happen, need not be fatal.

7. Key takeaways

The normal distribution is the thin-tailed bell fixed by its mean and standard deviation alone. De Moivre drew it in 1733 as the limit of the binomial; Gauss made it the law of error in 1809; Laplace’s central limit theorem explained why it appears whenever many small independent causes are added together.
Its mechanism is also its limitation: because probability collapses exponentially in the tails, the curve describes the calm middle of a system superbly and its violent extremes not at all. It is the right tool only where causes are additive, independent, and bounded.
Markets are not such a system. From Mandelbrot in 1963 to the leptokurtosis in every modern data set, equity returns carry tails far heavier than any bell — which is why a twenty-five-sigma week is a verdict on the model, not a run of bad luck.
The danger is concealed in everyday tools: beta, option-implied volatility, and value-at-risk all inherit the Gaussian’s thin tails. Size positions and limit leverage for the world that actually delivers, and never mistake low volatility for low risk.
The long-term tradition — Buffett, Marks, Taleb, and Graham before them all — converges on one rule: trust the curve for the middle, distrust it utterly at the edge, and buy with a margin of safety wide enough to survive the day the model says will never come.

— Manish Goel, FCA / NorthPath Advisory OÜ / Tallinn, Estonia

Important.
All content on this site and in this email is journalism and education for a general audience. Nothing here constitutes investment advice or a recommendation in respect of any specific financial instrument, nor an offer or solicitation to buy or sell any security. Readers should consult an authorised financial adviser regulated in their own jurisdiction before making any investment decision.

The Normal Distribution: Abraham de Moivre’s 1733 Bell Curve, Gauss’s Law of Error, and Why the Long-Term Equity Investor Must Never Mistake the Middle for the Tail

1. The model

2. The mechanism

3. The empirical record

4. Two episodes

5. Application to long-term equity investing

6. How the long-term equity tradition has used it

7. Key takeaways

More posts

Markov Chains: Why the Future Depends Only on the Present

The Wrong Question: Attribute Substitution and the Good-Company Trap

The Monte Carlo Method: Stanisław Ulam’s 1946 Game of Solitaire, the Engine of Chance That Modelled the Bomb, and Why the Long-Term Investor Should Think in Distributions, Not Forecasts

The Second Marshmallow: Walter Mischel’s 1968 Test of Self-Control, and Why the Market Pays Its Largest Rewards to the Long-Term Investor Who Can Wait