The NorthPath Letter · Behavioural Finance · Afternoon Edition · 31 May 2026
In 1954 a clinical psychologist at the University of Minnesota gathered every study he could find that pitted a trained expert’s judgment against a simple statistical formula, and reported a result that the profession spent the next half-century trying, and failing, to overturn: the formula usually won. Paul Meehl’s finding has been replicated across medicine, law, finance and human resources more thoroughly than almost any other claim in the social sciences. Yet the part of his argument that matters most to a long-term investor is not the headline. It is the footnote he attached to it — the rare case in which a person should override the formula, the case he called the broken leg. This letter is about why that exception is real, why it is far rarer than we believe, and why the discipline of a serious investor consists mostly of refusing to use it.
The bias: when judgment loses to arithmetic
There are two ways to combine information into a prediction. The first is clinical: a knowledgeable person weighs the evidence in their head and arrives at a conclusion using experience, intuition and a feel for the particular case. The second is actuarial, or mechanical: the same inputs are fed into a fixed rule — a formula, a checklist, a regression equation — that produces the answer with no further human deliberation. The clinical method feels obviously superior. It can notice things the formula cannot; it can adapt; it can recognise the exception. The entire prestige of professional expertise rests on the assumption that the trained human, looking at the whole picture, beats the mechanical rule.
Meehl assembled the evidence and found the opposite. In Clinical versus Statistical Prediction: A Theoretical Analysis and a Review of the Evidence (University of Minnesota Press, 1954), he reviewed roughly twenty studies that had directly compared the two methods on the same data — predicting academic success, parole violation, recovery from treatment, criminal recidivism. In almost every one, the simple actuarial rule matched or beat the expert clinician. Meehl, himself a practising clinician, described the conclusion as one he found personally distasteful and professionally alarming, which is part of why it is so persuasive: he was reporting a result against his own interest.
Three years later he supplied the crucial qualification. In a 1957 paper with the disarming title “When Shall We Use Our Heads Instead of the Formula?” (Journal of Counseling Psychology 4(4), pp. 268–273), Meehl conceded that there are situations in which the human must override the rule. His illustration has become a permanent piece of decision-science vocabulary. Suppose an actuarial table says a particular professor has a 90 per cent probability of attending the cinema on a given night. You then learn that he has just broken his leg and is in a hip cast that will not fit a theatre seat. No sensible person stays with the 90 per cent figure. The broken leg is a rare, decisive, observable fact that sits entirely outside the variables the formula was built on, and it flips the prediction to near zero.
The trap is in the next sentence, the one most people forget. Meehl’s point was not that broken legs justify discretion; it was that broken legs are rare, that we have almost no reliable record of how often clinicians are right when they claim to have spotted one, and that the human tendency is to imagine broken legs everywhere. Each time we override the rule because this case feels special, we are betting that we have found a genuine exception. We almost never have. The override is where the expert’s edge is supposed to live, and it is precisely where the edge is destroyed.

The mechanism: why the model of the expert beats the expert
Why should a fixed rule outperform a thoughtful human who has access to the same facts and more? The answer lies in three well-documented properties of human judgment, none of which the rule shares.
The first is inconsistency. A formula given the same inputs returns the same output every time. A human does not. Presented with identical case files on different days — or the same file twice without realising it — experts reach materially different conclusions, swayed by mood, fatigue, the order in which they read the evidence, or what they judged just before. The psychologist Lewis Goldberg demonstrated in 1970 that you can build a simple linear “model of the judge” from a clinician’s own past ratings, and that this model then predicts better than the clinician it was copied from. The model captures the expert’s valid insight while discarding the random noise of a particular afternoon. The unsettling implication is that much of what separates a good judgment from a bad one is not wisdom but variance.
The second is improper weighting. Even when an expert knows which factors matter, they cannot combine them consistently. Robyn Dawes showed in “The Robust Beauty of Improper Linear Models in Decision Making” (American Psychologist 34(7), 1979, pp. 571–582) that a model which simply adds up the relevant variables with equal weights routinely outperforms expert judgment — and often matches a statistically optimised model. The hard part of prediction is knowing what to look at; once the variables are chosen, mechanical addition does the rest better than the brain that chose them. The human mind over-weights whatever is vivid, recent or emotionally charged, and under-weights the dull base rate that carries most of the predictive load.
The third is the seduction of the special case. Every override feels justified in the moment because the human mind manufactures a coherent story for it. The story is the broken leg. But a compelling narrative is not evidence that a true exception exists; it is evidence only that we are good at constructing narratives. Daniel Kahneman, who devoted a chapter of Thinking, Fast and Slow (2011) to this literature under the heading “Intuitions vs Formulas,” put the practical rule plainly: the broken-leg exception should be invoked only on the basis of information that is both rare and decisively relevant, and “you must resist the temptation to invoke it.” For the investor, every position that the screen rejects but that you cannot resist buying anyway is a claimed broken leg. The discipline is in counting how many of them turn out to be real.
The empirical record: sixty years, one direction
Meehl’s 1954 sample was small enough that a sceptic could dismiss it. The decades since have removed that escape. The most comprehensive test is the meta-analysis by William Grove and colleagues, “Clinical versus Mechanical Prediction” (Psychological Assessment 12(1), 2000, pp. 19–30), which pooled 136 studies across medicine, mental health, education, forensics and finance. Mechanical prediction was on average about ten per cent more accurate than clinical judgment. Sorted by outcome, roughly 47 per cent of studies clearly favoured the formula, about 48 per cent showed the two methods tied, and in only a small remainder — on the order of six to sixteen per cent depending on the threshold — was the human clearly better. A decade earlier, Dawes, Faust and Meehl had laid out the same conclusion for a general scientific audience in “Clinical versus Actuarial Judgment” (Science 243, 1989, pp. 1668–1674). The verdict has proved remarkably stable: formulas win or draw far more often than they lose, and they do so across domains that have nothing in common except that a human is trying to predict an uncertain outcome.
Regulators have, in effect, encoded the same lesson into how institutions are now required to use models. In the United States, the Federal Reserve and the Office of the Comptroller of the Currency issued joint “Supervisory Guidance on Model Risk Management” (SR 11-7 / OCC Bulletin 2011-12) on 4 April 2011. Its organising principle is “effective challenge” — the requirement that models be subjected to critical scrutiny by independent, competent parties, and that their use, including any human override of their output, be governed by documented policy rather than left to discretion. In the European Union, the Artificial Intelligence Act (Regulation (EU) 2024/1689, adopted 13 June 2024) classifies systems that assess the creditworthiness of individuals as “high-risk” under Annex III, and Article 14 obliges deployers to ensure meaningful human oversight: a named, competent person must be able to understand the model’s output and, where warranted, reject, modify or halt it. The high-risk obligations begin to apply from 2 August 2026. Read together, both regimes make the same quiet admission. The model is the default; the human override is a controlled, accountable, exceptional act — not a standing license to substitute judgment whenever the case feels special.
Markets supply the cleanest natural laboratory of all. Active stock selection is clinical prediction; owning the index is the actuarial alternative. S&P Dow Jones Indices’ SPIVA Scorecard for year-end 2024 found that 89.5 per cent of US large-cap active funds underperformed the S&P 500 over the fifteen years to December 2024 — and that across all twenty-two US equity categories measured, not one had a majority of active managers beating the benchmark over that horizon. The professionals are the clinicians. The formula — hold the whole market and stop predicting — beat roughly nine in ten of them.

Two episodes: the formula mocked, the formula vindicated
The first episode is the founding of mechanical investing itself. On 31 August 1976, John Bogle launched the First Index Investment Trust, the first index mutual fund available to the ordinary investor. The premise was pure actuarial humility: do not try to identify the winners, own them all, and keep costs near zero. Wall Street’s reaction was derision. Bogle had hoped the underwriting would raise as much as 150 million dollars; it raised a little over 11 million, an outcome he later called “an abject failure.” The bankers running the offering suggested he cancel the fund. Competitors called it “un-American” and circulated the phrase “Bogle’s Folly.” The objection was always the same: who would settle for the average when a skilled manager could surely do better? Half a century of SPIVA data has answered the question. The fund that began as a mocked refusal to exercise judgment became, by assets, the default home of long-term capital for tens of millions of people, precisely because the judgment it refused to exercise turned out to subtract value far more often than it added.
The second episode is the most explicit broken-leg experiment ever run in public markets, and it was run by a manager who believed in security analysis. Joel Greenblatt’s “Magic Formula” — described in The Little Book That Beats the Market (2005) — ranks stocks mechanically on earnings yield and return on capital and instructs the investor to hold the top of the list, rebalancing on a schedule, with no discretion. When Greenblatt’s firm offered the strategy through managed accounts, it created a controlled comparison. Some clients let the system run untouched. Others were given the same ranked list but allowed to choose which names to hold and when to trade — to exercise judgment. Over a two-year period the disciplined, mechanically managed accounts returned 84.1 per cent after costs. The self-managed accounts, drawing from the identical list, returned 59.4 per cent — trailing not only the formula but the S&P 500’s 62.7 per cent. The investors had taken a winning system and, in Greenblatt’s summary, used their own judgment to eliminate all of its outperformance and then some. They skipped the names that looked most frightening — which were often the biggest subsequent winners — sold after declines and bought after rallies. Every one of those decisions was a claimed broken leg. Almost none was real.
The counter-measure framework: three disciplines
The lesson is not that judgment is worthless. It is that judgment’s reliable contribution lies in designing the rule, not in overriding it case by case, and that the override must be treated as a rare and expensive act. Three disciplines operationalise this.
1. Write the rule before you see the case
Meehl’s formula did its work because it was specified in advance, in writing, independent of any particular subject. The investing analogue is to define the screen, the checklist or the allocation policy before a specific opportunity is in front of you, and — just as importantly — to write down in advance the narrow conditions under which an override would be justified. A rule authored after the case has appeared is not a rule; it is a rationalisation wearing a rule’s clothes. Pre-commitment removes the single largest source of error: the freedom to decide, in the heat of a compelling story, that this time the criteria do not apply.
2. Keep a broken-leg log
Grove’s meta-analysis exposed a gap that Meehl had flagged in 1957: we have almost no systematic record of how often human overrides actually improve on the formula, because overrides are rarely tracked. The remedy is to generate that record for yourself. Every time you depart from your own rule — buying what the screen rejected, holding what it said to sell — record the decision, the specific reason, and the formula’s contrary recommendation, then revisit the outcome later. A genuine broken leg is rare, decisive, observable, and lies outside the inputs your rule already considers. A story is none of those things. Maintaining the log makes the override expensive in the only currency that disciplines behaviour: an honest, accumulating tally of whether your exceptions have helped or hurt. Most investors who keep such a record discover that their overrides, in aggregate, cost them money.
3. Default to the model; cap discretion’s budget
If overrides systematically destroy value, the structural response is to limit how much of the portfolio discretion is permitted to touch. Let the mechanical core — the index position, the rules-based screen, the predetermined rebalancing — run without interference, and ring-fence a small, explicit budget for the high-conviction exceptions you cannot resist. This converts “effective challenge,” in the regulators’ phrase, from a slogan into an architecture: doubt is institutionalised, the model is the presumption, and the burden of proof sits with the human who wishes to deviate. The point of the budget is not to forbid judgment but to ensure that when judgment is wrong — which, on the evidence, is most of the time it overrides — the damage is bounded.

How long-term-equity practitioners addressed it
The investors who have written most clearly about compounding tend to arrive, independently, at Meehl’s conclusion. James O’Shaughnessy built his book What Works on Wall Street (1996) directly on the clinical-versus-actuarial literature, citing Meehl’s 1954 review by name and framing his entire project as the substitution of disciplined, back-tested mechanical strategies for the “unreliable experts.” His repeated theme is that the strategies are not the hard part; human nature is, because we are built to prefer a vivid story to a dull base rate, and that preference is exactly what the formula protects us from.
Joel Greenblatt, having watched his own clients dismantle a working system, distilled the lesson into a list of behaviours to avoid — not chasing what has recently risen, not abandoning the method after a bad stretch, not letting the scariest-looking names (often the most rewarding) be quietly skipped. His managed-account experiment stands as the field’s textbook demonstration that the edge frequently lives in not interfering.
The pattern reaches back further. Benjamin Graham, the founder of security analysis, spent his last years recanting much of its elaborateness. In a 1976 interview in the Financial Analysts Journal, he said he no longer believed detailed, case-by-case analysis was worth the effort for most investors and favoured instead simple, mechanical group criteria applied to baskets of cheap stocks. The man who had taught a generation to study companies one by one concluded that a rule, applied without exception, would serve most people better than their own discretion. Bogle reached the same destination by a different road: the index fund is, at bottom, a refusal to play the clinician at all. Across very different temperaments, the long-term compounders converged on a single discipline — build a sound rule, then defend it against your own cleverness.
Key takeaways
- The formula usually wins. Across sixty years and more than 136 pooled studies, simple mechanical rules have matched or beaten expert judgment in roughly 95 per cent of comparisons. Markets are no exception: nearly nine in ten US large-cap active funds trailed the index over fifteen years.
- The broken-leg exception is real but rare. Genuine overrides — rare, decisive, observable facts outside the model’s inputs — do exist. The investor’s problem is never too few overrides; it is far too many, each dressed in a convincing story.
- Write the rule before the case. Specify the screen, the checklist and the narrow conditions for an override in advance. A rule authored after the opportunity appears is a rationalisation.
- Keep a broken-leg log. Record every deviation, its reason and the rule’s contrary call, and audit the outcomes. The tally usually shows that discretion subtracted value.
- Default to the model; bound the discretion. Let the mechanical core run untouched and ring-fence a small budget for exceptions, so that when judgment is wrong — as it mostly is when it overrides — the cost is contained.
— Manish Goel, FCA / NorthPath Advisory OÜ / Tallinn, Estonia
|
Important. |
