AFTERNOON EDITION — Mental Models
In the summer of 1885, Francis Galton stood before the Royal Anthropological Institute in London with a curious finding. He had measured the heights of 928 adult children and 205 of their parents, and noticed that the offspring of unusually tall parents tended to be tall — but less tall than the parents. The offspring of unusually short parents tended to be short — but less short. Each generation, in other words, drifted back toward the average of the population. He published the lecture the following year in the Institute’s Journal under a title he chose with care: “Regression towards Mediocrity in Hereditary Stature.”
For Galton, the word “mediocrity” was statistical, not pejorative. He had stumbled on the first mathematical description of a phenomenon that now sits under nearly every empirical exercise in finance, medicine, sports, education, and public policy: when a measurement is the sum of a stable component and a noisy one, an extreme observation in the first period will be followed, on average, by a less extreme observation in the second. The drift is not punishment for excellence. It is the arithmetic of noise.
1. The model — Galton 1886, in his own words
The canonical citation is Francis Galton, “Regression towards Mediocrity in Hereditary Stature,” Journal of the Anthropological Institute of Great Britain and Ireland, vol. 15 (1886), pp. 246–263. Galton’s own one-sentence form, set out on page 252, is the cleanest definition we have: the deviation of the offspring from the population average is, on average, two-thirds of the corresponding deviation of the parents. He called that coefficient — the two-thirds — the “ratio of regression.” When modern statisticians later rebadged the linear technique he had invented as “regression analysis,” they preserved his accidental terminology long after the population-genetics origin had faded.
A stricter contemporary form, due to Karl Pearson’s 1903 generalisation in Biometrika, runs as follows. If X and Y are correlated with coefficient r, and both are standardised to zero mean and unit variance, then the conditional expectation of Y given X is exactly r · X. Whenever the absolute value of r is less than one — which is to say, whenever the two variables are not perfectly correlated — the predicted Y is closer to its mean than X was to its own. The mean is the gravitational centre toward which any less-than-perfectly-correlated system is pulled.
Galton’s discovery survives one essential restatement for the modern reader: the phenomenon does not require any causal mechanism. It is a property of measurement under uncertainty. Stephen Stigler, the historian of statistics, has written that this is the single most under-taught idea in quantitative reasoning, precisely because it produces effects that the naïve mind insistently re-narrates as cause and effect (Stigler, Statistics on the Table, Harvard University Press, 1999, chapter 9). The investor who internalises only one rule from this essay should internalise that one: when an extreme draws an explanation, the explanation may simply be the arithmetic.

2. The mechanism — why the drift is unavoidable
The cleanest way to see why regression must happen is to decompose any observed outcome into two parts: a persistent component, call it Skill, and a transient component, call it Luck. Suppose we observe the top decile of a population — top-decile mutual-fund managers, top-decile athletes, top-decile parental heights. By definition, these observations carry unusually favourable Luck draws on top of whatever Skill they possess. In the next period, Skill persists by assumption, but Luck — being the noisy component — is, by definition, drawn afresh from a distribution centred on zero. The expected new observation is therefore Skill plus zero, lower than Skill plus favourable Luck. The top decile must, on average, fall back toward the mean.
The size of the drift is governed by one ratio: the share of total variance contributed by Skill. If a domain is mostly Skill — Olympic sprint times for elite athletes after years of selection — regression will be small. If it is mostly Luck — single-quarter mutual-fund returns — regression will be enormous. Michael Mauboussin’s 2012 book The Success Equation (Harvard Business Review Press) titled this the “skill-luck continuum” and made the point that the investor’s first job in nearly every empirical domain is to estimate that ratio before extrapolating a single number.
It follows that regression is asymmetric in a useful way. Extreme outcomes regress most strongly. Performance near the mean barely regresses at all. The investor who learns to feel the pull harder at the tails — both the celebrated top and the punished bottom — is doing the work of the model. And it follows, too, that the drift is on the conditional expectation: there is no guarantee that any specific top-decile observation will fall back, only that the average of all top-decile observations will. The mistake of applying a population property to one individual is the ecological fallacy in reverse, and it is endemic in financial commentary.
3. The empirical record — three exhibits from active equity
The financial evidence on mean reversion is, on the whole, embarrassingly consistent. Three exhibits.
Mark Carhart’s 1997 paper “On Persistence in Mutual Fund Performance” (Journal of Finance 52(1): 57–82) examined the entire universe of US diversified equity funds from 1962 to 1993, sorted them into deciles each year on the basis of their previous-year return, and tracked the next-year performance. After adjusting for the market, size, value, and momentum factors he had assembled, the persistence of top-decile alpha was statistically indistinguishable from zero. The bottom decile, by contrast, persisted — bad funds stayed bad, largely because of expenses and turnover. The implication is the one Galton would have predicted: the favoured deciles are dominated by noise, and noise regresses; the disfavoured deciles include a structural deadweight (fees) that does not.
S&P Dow Jones Indices has run a quarterly persistence scorecard for two decades. In the U.S. Persistence Scorecard Year-End 2024, the share of top-quartile US large-cap funds that remained top-quartile across the next five calendar years was 0%. That is not a misprint. Reading the same scorecard for the mid-2025 update, only 29% of top-quartile large-cap funds maintained their position even over a subsequent two-year window. A coin flip would have predicted 25% over two years and roughly 0.4% over five. Active equity performance is now indistinguishable, in persistence terms, from random selection followed by regression. The mathematics permits a sharper statement: the noisier the signal you select on, the less of it survives the second draw.

The corporate analogue is no less stark. Robert Wiggins and Timothy Ruefli, in “Sustained Competitive Advantage: Temporal Dynamics and the Incidence and Persistence of Superior Economic Performance” (Organization Science 13(1), 2002: 81–105), studied the return on assets of 6,772 firms across 40 industries between 1972 and 1997. Of the firms that achieved “superior” returns in their stratum, only about 5% sustained that position for 10 years or more, and just 0.5% for 20 years. The default destination of an above-average return-on-assets number is the industry mean, and the rate of decay can be calibrated. McKinsey’s repeat of the exercise in Valuation (Koller, Goedhart and Wessels, 7th edition, 2020, chapter 8) finds the same shape: the median high-ROIC firm gives back about half its excess return within seven to ten years.
4. Two historical episodes
Israeli flight school, 1969. Daniel Kahneman, then a young consulting psychologist for the Israeli Air Force, was lecturing senior flight instructors on the established behavioural finding that praise produces better learning outcomes than punishment. A grizzled instructor objected. With respect, sir, he said in effect, what you are saying is for the birds: I have many times praised flight cadets for the clean execution of some aerobatic manoeuvre, and the next time they tried it they did worse; I have often screamed into a cadet’s earphone for bad execution and on the next try he improved. Kahneman writes that the moment was the most important insight of his early career. The cadets’ performance was a noisy signal around a stable mean. A spectacularly clean manoeuvre was, by definition, mostly luck on top of skill; the next attempt would regress whether the instructor praised or screamed. The instructor had been the unwitting witness to thirty years of regression to the mean, mistaking statistical gravity for causation. The episode is recounted in Kahneman, Thinking, Fast and Slow (Farrar, Straus and Giroux, 2011), chapter 17, and its formal version had appeared four decades earlier in Kahneman and Tversky, “On the Psychology of Prediction,” Psychological Review 80(4), 1973: 237–251.
Sports Illustrated cover jinx. From 1954 onward the legend within the magazine was that any athlete or team gracing the cover would subsequently underperform. In a 2002 internal review the editors counted 913 covers; 37% had been followed by some “decline.” Statisticians who examined the data — including Schaffer (2002) and Smith and Smith (2011) — found no jinx at all, only the regression Galton had described 116 years earlier. Cover athletes were, almost by selection, drawn from the upper tail of recent performance; mean reversion guaranteed that the next month would be statistically less impressive than the month that had earned them the cover. The “jinx” was a narrative built around the arithmetic of selection.
Both episodes carry the same warning for the investor: when an observation is selected because it is extreme, the next observation will, on average, be less extreme. Any narrative that explains the change in causal terms — the cover cursed him, the praise spoiled her, the new CEO destroyed the franchise — is a narrative that may simply be re-describing regression. The mind reaches for a story; the spreadsheet would have suggested gravity.
5. Application to long-term equity investing
Three operating disciplines fall directly out of Galton’s mathematics.
Discipline one: never project the past five years of profit margins straight into the future. The single most reliable mean-reverting series in financial history is the corporate profit share of national income — what GMO’s Jeremy Grantham has called, only half-joking, the most mean-reverting series in finance. The mechanism is the one Adam Smith identified in The Wealth of Nations (1776, Book I, Chapter VII): high margins attract competition, low margins repel it. The empirical record in the United States, where post-1947 National Income and Product Accounts data permits a long view, shows after-tax corporate profit margins oscillating in a relatively narrow band around 6 to 8 percent of GDP, with each excursion to either extreme corrected within roughly a decade. A discounted-cash-flow model that capitalises peak margins as a terminal-year assumption will, in regression-to-the-mean terms, systematically overstate intrinsic value at cycle peaks and understate it at troughs. The corrective is mechanical: stress-test every long-duration model with a margin path that reverts to a sector mean within ten years, and require the investment thesis to survive that test.
Discipline two: when selecting active managers — including selecting oneself as one’s own active manager — discount the persistence of recent outperformance to roughly nil after five years. The Carhart finding, replicated in every multi-year SPIVA scorecard, is that top-decile performance over one-, three-, and five-year windows is almost entirely a noise phenomenon, with one important exception: costs and structural disadvantages — high fees, poor execution, persistent leverage at the wrong points in the cycle — produce real persistence on the downside. The investor’s manager-selection model should therefore be asymmetric. Be sceptical of celebrated past returns; take negative persistence seriously as a structural signal rather than a temporary embarrassment.
Discipline three: at extremes of valuation, the price-multiple itself becomes the dominant mean-reverting variable. Robert Shiller’s cyclically adjusted price-earnings ratio (CAPE), constructed in Campbell and Shiller, “Stock Prices, Earnings, and Expected Dividends,” Journal of Finance 43 (1988), has the unhappy distinction of explaining roughly 40 percent of the variance in subsequent ten-year real US equity returns since 1881. The mechanism is again Galton’s: peak multiples, like peak margins, are by definition the joint product of fundamentals and noise, and the noisy component regresses. This is not a market-timing claim — short-horizon predictability is essentially zero — but a discipline against starting positions at extreme starting multiples without a compensating margin of safety. The investor who buys at the 95th percentile of CAPE and waits ten years must expect that the median outcome will be set largely by multiple compression, not by underlying earnings growth.

6. How the long-term equity tradition has used it
Howard Marks, in his July 2003 memo “The Most Important Thing” and the May 2001 memo “You Can’t Predict, You Can Prepare,” made regression to the mean the engine of his cycle theory. The Oaktree pendulum, Marks wrote, swings not from euphoria to despair because anyone wills it to, but because the very behaviours that produce extreme valuations contain the seeds of their reversal: high valuations attract supply of paper and erode prospective returns until the marginal buyer rebels; low valuations starve supply and improve prospective returns until the marginal seller capitulates. In his 2011 book of the same name, The Most Important Thing: Uncommon Sense for the Thoughtful Investor (Columbia Business School Publishing, 2011), Marks devotes an entire chapter — chapter 8, “Being Attentive to Cycles” — to the proposition that the investor who fails to internalise mean reversion will be most aggressive when prospective returns are lowest and most defensive when they are highest. His operating heuristic, articulated again in the September 2014 memo “Risk Revisited,” is to scale risk-taking inversely with prevailing valuations, precisely because of the Galton mechanism.
Jeremy Grantham at GMO has built the firm’s seven-year asset-class forecast on the same idea. In a sequence of quarterly letters from 1994 onward, and in the February 2012 letter “The Longest Quarterly Letter Ever,” Grantham observes that profit margins and price-earnings ratios are the two great mean-reverting variables in equity markets, and that GMO’s forecasts assume both will return to their long-run averages within seven years. His June 2017 piece, “This Time Seems Very, Very Different,” extended the framework with a candid admission: in the platform-monopoly era, the speed of regression in margins has slowed; he now models a fifteen-to-twenty-year half-life for profit-share reversion rather than the seven years that prevailed from 1900 to 1997. The model survives; only the time constant has changed. The GMO seven-year forecast remains the most public mean-reversion betting card in the industry, and its track record — broadly accurate on direction across decadal windows, frequently early on timing — is precisely what one would expect from a model that uses Galton’s gravity correctly but cannot pin the exact moment of return.
Warren Buffett, characteristically, has acknowledged the same gravity while resisting its full implications for the highest-quality franchises. In his 1989 Berkshire Hathaway letter (“Mistakes of the First 25 Years”) he wrote that he had repeatedly paid too little attention to the tendency of high returns on capital to attract competition, and too much attention to apparently cheap statistical bargains where the underlying economics were quietly regressing toward unprofitability. His later doctrine — pay a fair price for a wonderful business — is in part an acknowledgement that some businesses, by virtue of structural moats, regress more slowly than the average, but not that they regress not at all. The 1999 Sun Valley speech, reprinted in Fortune on 22 November 1999 under the title “Mr. Buffett on the Stock Market,” is a sustained warning that aggregate US after-tax corporate profits have been mean-reverting against GDP for the entire post-war period and will not, contrary to the late-1990s consensus, settle permanently at a higher plateau.
The discipline these three practitioners share is not market timing but probability-weighting. Each builds his portfolio around the prior that extreme observations — of returns, of margins, of multiples — will, on average, fall back toward a mean that one can estimate from a long enough history. The variance of each individual outcome remains large; the directionality of the conditional expectation does not.
7. Key takeaways
Galton’s regression is a property of measurement under uncertainty, not a causal force; the mind insists on re-narrating it as one and the investor must resist that re-narration. The strength of the pull is set by the ratio of Skill variance to Luck variance, and that ratio must be estimated domain by domain before extrapolating any extreme observation. In equity markets, the two great mean-reverting variables are profit margins and valuation multiples, and every long-duration model should be stress-tested with a path that reverts both within ten to fifteen years. Manager-selection should be asymmetric: discount celebrated outperformance toward random, but take poor performance and high costs as the persistent signals they are. In cycle terms, the Oaktree pendulum and the GMO forecast both encode the same Galton insight; the investor who internalises it tends, over decades, to take risk when others will not and to take risk off when others crowd in. The model is 140 years old; its tax on the investor who ignores it is paid afresh in every cycle.
— Manish Goel, FCA / NorthPath Advisory OÜ / Tallinn, Estonia
|
Important. |












