Afternoon Edition · Mental Models · Essay No. 10 · 26 May 2026 · Tallinn
1. The model
In Ars Conjectandi, published posthumously in Basel in 1713 from a manuscript Jakob Bernoulli had worked on for at least twenty years, the Fourth Part contains what its author called the Theorema Aureum—the Golden Theorem—and what every modern textbook now calls the (weak) law of large numbers. Bernoulli proved that the relative frequency of a binary outcome, observed across an increasing number of independent trials, converges in probability to the true underlying probability of that outcome. In the language he himself used: if a bag contains a fixed but unknown proportion of white and black pebbles, then the more pebbles you draw with replacement, the closer the observed white-fraction will come to the true white-fraction, with arbitrarily high probability, as the number of draws becomes large.
The result was the first formal demonstration that observed frequencies tell you something reliable about the world that produced them. It was so important to Bernoulli that he refused to publish the rest of the work without it; his nephew Niklaus eventually edited the manuscript for the 1713 edition, eight years after Jakob’s death. Modern probability theory distinguishes a weak form (Khintchine, 1929—convergence in probability) and a strong form (Borel, 1909; Kolmogorov, 1930—almost-sure convergence). Both say the same thing in plain English: the sample mean of independent and identically distributed random variables with a finite expectation tends to that expectation as the sample size grows. The historian of statistics Stephen Stigler, in The History of Statistics (Harvard University Press, 1986, chapter 2), treats Bernoulli’s 1713 theorem as the foundation stone of the entire frequentist edifice.
For the long-term equity investor, the one-sentence form is this: the longer you sample a process, the closer your observed average will come to the process’s true expected value—but only if the process you are sampling is stationary, the draws are independent, and the expected value exists. All three of those conditions matter for what an investor can and cannot conclude from a track record. The model is more often misused than used. The investing literature is full of casual appeals to “the long run” that smuggle in unproven assumptions about stationarity. A central purpose of this essay is to separate the law itself from those misuses, and to show how Bernoulli’s theorem, applied with care, becomes one of the strongest pillars of a long-term equity discipline.
2. The mechanism
Why does the law work, and what makes it brittle? Consider a portfolio of n independent and identically distributed positions whose individual return X has finite expectation μ and finite variance σ2. The arithmetic mean of the n positions is X̄n = (X1 + X2 + … + Xn) / n. Two elementary facts about the distribution of X̄n drive everything that follows. The expectation E[X̄n] equals μ itself, irrespective of n: the expected value of the average is the true mean. The variance Var(X̄n) equals σ2 / n: the standard deviation of the average shrinks as 1/√n.
The shrinkage in 1/√n is the mathematical engine. Doubling the sample size reduces the standard error by a factor of √2, not 2. Quadrupling it halves the standard error. This is why convergence is real but slow: getting from a standard error of ten percent down to one percent requires a hundred-fold increase in sample size, not a ten-fold one. It is also why the most common quantitative claim in investing—”this strategy has worked over a five-year backtest”—is, in many cases, a single noisy observation rather than statistical evidence.

The mechanism rests on four assumptions, each of which is fragile in real markets. The first is the independence of draws: cross-correlated positions—every Indian small-cap, every European bank stock, every US high-multiple software name—do not provide n independent observations; they provide some smaller effective sample size neff. The second is identical distribution: if the underlying process changes over the sampling window—a regulatory regime change, a structural shift in interest rates, the entry of a new disruptive technology—what looks like one long sample is actually two short samples glued together. The third is finite expectation: for a few important investment-relevant distributions, notably power-law-tailed return distributions in the spirit of Mandelbrot (1963) and Taleb (2020), the theoretical mean exists but the sample mean converges very slowly; for distributions without finite variance, the central limit theorem fails altogether. The fourth is a long enough horizon: convergence is asymptotic, and at any finite n the sample average remains a random variable around the true mean.
Amos Tversky and Daniel Kahneman, in their 1971 paper “Belief in the Law of Small Numbers” (Psychological Bulletin, vol. 76, no. 2, pp. 105–110), documented that even trained statisticians systematically overestimate the reliability of small samples—they treat n = 20 as if it were the asymptotic case. The behavioural literature has replicated this finding many times since. For the investor, the takeaway is that the law of large numbers is silent at the sample sizes most investors care about; only the law of small numbers is operative.
3. The empirical record
The most striking empirical record of the law of large numbers in equity markets concerns the wide dispersion of individual stock returns and the consequently large sample sizes required before broad-market averages stabilise. Hendrik Bessembinder, in “Do Stocks Outperform Treasury Bills?” (Journal of Financial Economics, vol. 129, 2018, pp. 440–457), computed lifetime returns for the universe of CRSP US common stocks from 1926. Of roughly 26,000 individual stocks, just over half (51.6 per cent) delivered lifetime returns below those of one-month Treasury bills. The aggregate equity premium over Treasury bills since 1926 was driven by a small minority: the top 4 per cent of stocks accounted for the entirety of the net dollar wealth creation; the median stock destroyed wealth relative to T-bills. In the global update (Bessembinder, Chen, Choi & Wei, “Long-Term Shareholder Returns: Evidence from 64,000 Global Stocks,” SSRN working paper, 2023), 60.9 per cent of 64,000 international stocks underperformed cash over their lives.
Bessembinder’s numbers are the law of large numbers in operation. The market-cap-weighted aggregate is well-behaved because n is enormous—tens of thousands of names over a century, with very high effective sample size. The index-level mean reliably reflects the equity-premium expectation. But the same arithmetic implies that a portfolio of ten names is not a meaningful sample of the equity-return distribution. The standard error of the average return of ten randomly-drawn stocks is roughly thirty per cent of the true σ; for one hundred names it is roughly ten per cent. This is the source of Meir Statman’s “diversification ratio” empirical result (“How Many Stocks Make a Diversified Portfolio?”, Journal of Financial and Quantitative Analysis, vol. 22, no. 3, 1987, pp. 353–363): something on the order of thirty stocks captures most diversifiable variance, but residual idiosyncratic risk remains material.

The other empirical anchor is the S&P Indices Versus Active Funds (SPIVA) scorecard, published semi-annually since 2002. The SPIVA U.S. Mid-Year 2024 Scorecard reports that, over the fifteen-year window through June 2024, 89.9 per cent of large-cap actively-managed equity mutual funds underperformed the S&P 500. The single-year figure for the twelve months ended mid-2024 was around 57 per cent. The five-year figure was 77 per cent. As the horizon lengthens—as n grows—the dispersion in apparent skill compresses dramatically, and the share of funds that look skilful approaches the share that one would expect from pure noise net of the cost drag. Both data sources point to the same operating fact for the long-term equity investor: at short horizons, almost anything can happen in the sample mean; at long horizons, structural truths assert themselves. The discipline is not to confuse the two regimes.
4. Two historical episodes
4.1 The Nifty Fifty, 1968–1974
Through the late 1960s and into 1972, a roughly forty-stock set of US growth franchises—Polaroid, Eastman Kodak, Xerox, IBM, Avon, Coca-Cola, Johnson & Johnson, Procter & Gamble, McDonald’s, Disney—traded at price-earnings multiples between fifty and ninety, on the proposition that their durable growth justified essentially any starting multiple. The empirical evidence then cited was their immediate post-war record: roughly two decades of high and apparently stable earnings growth. The argument was framed as a long-run truth.
It was a short-run sample. The sample period chosen (1949–1969) was a unique structural episode: a US export franchise into a war-flattened world, the bedding-in of the post-war consumer economy, and a long disinflation. When the 1973–74 bear market began and the underlying stagflation revealed itself, the Nifty Fifty stocks fell forty to eighty per cent from peak; several—Polaroid, Avon, Eastman Kodak—never recovered their 1972 highs in real terms. Jeremy Siegel’s two retrospectives (“Valuing Growth Stocks: Revisiting the Nifty Fifty,” AAII Journal, October 1998, and “The Nifty Fifty Revisited,” Journal of Portfolio Management, vol. 21, 1995) showed that the basket as a whole did, eventually, justify its 1972 multiples over thirty years—but only as an aggregate, with extreme dispersion within the basket and decades of underwater holding for many individual names. The episode is the canonical example of treating a small, regime-specific sample as if it were the asymptotic case.
4.2 Long-Term Capital Management, 1994–1998
LTCM’s swap-spread and convergence trades were sized using volatility estimates from a roughly five-year sample of post-Maastricht European data, in which sovereign spreads had been gently grinding tighter. The bet was that the empirical volatility of that period was representative of the underlying process. Roger Lowenstein’s When Genius Failed (Random House, 2000) and Donald MacKenzie’s reconstruction in An Engine, Not a Camera (MIT Press, 2006, chapter 8) both document that LTCM’s leverage was calibrated to volatility numbers from a benign regime that excluded both the 1987 crash and the 1998 emerging-market crises that followed. When Russia defaulted on its rouble-denominated debt in August 1998, the realised volatility was an order of magnitude above the modelled volatility; the convergence trades widened rather than converged; and the fund—with capital of $4.7 billion at peak and notional positions over $1.25 trillion—required a $3.6 billion Fed-coordinated bailout to wind down without forcing a systemic event.
LTCM is not a story about the law of large numbers failing. It is a story about the assumption of stationarity failing. The sample size was, mathematically, adequate for narrow inference; what was inadequate was the assumption that the next draw came from the same distribution as the prior draws. Both episodes—the Nifty Fifty and LTCM—teach the same operating lesson: it is not n that matters, it is whether the n draws come from a distribution that resembles the distribution that will generate the next draw.
5. Application to long-term equity investing
Three operating disciplines follow directly from the law of large numbers for any investor with a multi-decade horizon.
Discipline 1: Concentrate, but ensure enough independent bets to let convergence work. A one-stock portfolio has, by construction, an effective sample size of one. The standard error of its annual return is the standard error of a single name—for individual stocks, that has historically been roughly thirty to fifty per cent per year (Bessembinder, 2018). A thirty-stock portfolio of well-diversified independent exposures has an effective n closer to thirty, and a sample-mean standard error roughly five to six times smaller. The trade-off between conviction (concentrate) and convergence (diversify) is genuine, but the relevant variable is effective n, not nominal n. Forty correlated bank stocks are still one bet. The right test for any new candidate is whether its primary economic exposure is materially different from the exposures already in the book.
Discipline 2: Demand long horizons before judging skill. The SPIVA data implies that even five-year returns provide weak evidence of skill, because the noise dominates the signal. The relevant unit of sample in investment skill is not the trade or the quarter but the cycle. Michael Mauboussin’s The Success Equation (Harvard Business Review Press, 2012) shows that for activities where luck plays a substantial role, the required sample size to detect a one-percentage-point edge with reasonable confidence is in the dozens of cycles, not the dozens of months. The honest implication is that an investor must judge their own process more by the discipline of the inputs (research depth, position sizing, behavioural restraint) than by the trailing returns of the outputs over any short window.

Discipline 3: Distinguish stationary from non-stationary processes before extrapolating. Most investment “rules”—sector beta, factor premia, sovereign spread relationships, currency mean-reversion—are stationarity assumptions wearing the costume of statistical inference. The questions to ask, before applying any historical relationship to capital, are: what regime produced this sample?, what would change the regime?, and would I notice the regime change in time? If the answers are unclear, the sample is short, and the prudent posture is humility about the inference. Warren Buffett’s 1996 owner’s manual injunction—that Berkshire avoids situations where it must “be precise about a number that we don’t really understand”—is, at its root, a statement about non-stationarity: when the data-generating process can shift in ways we cannot anticipate, no amount of historical data delivers asymptotic comfort.
These three disciplines do not produce a strategy. They produce a posture: the long-term equity investor is one who accepts that her edge is statistical, that statistical edges only manifest over many independent observations, and that the cost of forgetting this is the destruction of the very compounding she was trying to harvest.
6. How the long-term equity tradition has used it
Warren Buffett has invoked the law explicitly, if informally, throughout the Berkshire Hathaway chairman’s letters. In the 1991 chairman’s letter (Berkshire Hathaway Inc., 1991 Annual Report, dated 28 February 1992), Buffett described the insurance underwriting franchise as one whose results would, “with a long-enough horizon and a wide-enough underwriting book, revert to the underlying actuarial truth.” The thought is repeated, in different forms, in the 1996 owner’s manual and again in the 2014 letter marking Berkshire’s first fifty years: investment skill manifests across a sample of decades, not a sample of months. Berkshire’s own structure—permanent capital, no redemption pressure, a willingness to hold concentrated positions for thirty years—is engineered to let the law operate without interruption. The insurance float strategy in particular is a literal application of Bernoulli’s theorem: across a sufficiently large book of independent risks, the underwriting result converges to the underlying actuarial expectation, and the float earns a return in between.
Howard Marks has built much of his published thinking around the same statistical core. The Oaktree memo “Risk” (January 2006) frames investment risk as the distribution of possible outcomes around an expected value, and warns explicitly against treating a small realised sample as evidence about the distribution. In “How the Game Should Be Played” (Oaktree Capital, August 2017) and again in Mastering the Market Cycle (Houghton Mifflin Harcourt, 2018, chapter 1), Marks returns to the same point: a single year, a single trade, a single cycle is one draw from a wide distribution; the investor’s job is to think probabilistically about all the draws that could have happened, not just the one that did. Bernoulli’s theorem is the formal expression of why this discipline matters: the next draw is information, but it is not the truth.
Charles Ellis, in “The Loser’s Game” (Financial Analysts Journal, July–August 1975, pp. 19–26), made the same argument earlier and in stronger form. Ellis’s central observation was that the proliferation of professional investors and the falling cost of information had moved equity markets from a winner’s game (where skill systematically rewards itself in the short run) to a loser’s game (where the dominant variable is the cost of mistakes). The implication, framed in our terms: in a loser’s game, the long-run statistical result is determined by who can afford to wait for n to become large enough for the mean to assert itself, net of fees and frictions, and who has the temperament to resist acting on small-n signals. The rise of indexed and long-only patient capital in the four decades since is, in a sense, the institutional embodiment of Ellis’s reading of Bernoulli. The intellectual chain from Bernoulli to Ellis to Buffett to Marks is direct. It is not a chain about specific stock picks; it is a chain about what kind of evidence about investment skill it is rational to demand, and on what time scale.
7. Key takeaways
The law of large numbers is the formal justification for taking the long view, but it is silent on whether any particular sample is large enough. The honest investor decomposes “long-run” claims into a precise n and a precise assumption about stationarity. Standard error shrinks as 1/√n, not 1/n: doubling a sample halves the standard error by a factor of about 1.41, not 2, and most published track records are at sample sizes where most of the variation is still noise. Independent observations are the input—correlated positions are not; effective n in a portfolio is almost always materially below nominal n, and the first test of any new position is its marginal contribution to true independence. The Nifty Fifty and Long-Term Capital Management are the same mistake in different clothing: both treated a short, regime-specific sample as a description of the underlying process. The long-term equity tradition, from Bernoulli through Ellis to Buffett and Marks, has never been about predicting the next outcome; it has been about earning the right to wait for the law to operate.
— Manish Goel, FCA / NorthPath Advisory OÜ / Tallinn, Estonia
|
Important. |
