Bayes’ Rule: Thomas Bayes (1763) and the Long-Term Investor

Editorial cover: Bayes Rule by Thomas Bayes 1763 — navy and gold diagram of prior, likelihood and posterior with the update step illustrated as a probability bar.

Afternoon Edition — Mental Models · Essay No. 7 · 25 May 2026 · Tallinn

1. The model: a posthumous paper that quietly reorganised how we should think

Thomas Bayes was a Presbyterian minister and amateur mathematician who lived in Tunbridge Wells and died in 1761. He published almost nothing in his lifetime. Two years after his death, his friend Richard Price submitted his notes to the Royal Society. The paper appeared in 1763 under the unassuming title An Essay towards solving a Problem in the Doctrine of Chances, in volume 53 of the Philosophical Transactions, pages 370 to 418. It was largely ignored for the next century and a half.

What Bayes had derived, and what Pierre-Simon Laplace independently restated in cleaner form in 1774 and 1812, was a rule for updating a belief in light of new evidence. In modern notation the rule is one line. The probability of a hypothesis H given new data D equals the probability of H before seeing D, multiplied by the probability that D would have occurred if H were true, divided by the unconditional probability of D itself. Or, in the form an investor will use most often: posterior is proportional to prior times likelihood.

That single equation does several things at once. It tells you that your starting belief — your prior — matters and must be made explicit. It tells you that the diagnostic value of a piece of evidence is not its loudness but the ratio of how likely it is under the hypothesis you favour versus under the alternative. It tells you that updating is a multiplicative, not additive, operation, which means very strong evidence can swamp a prior and very weak evidence almost cannot. And it tells you that two reasonable people with different priors and the same evidence will, with enough rounds of updating, eventually converge on the same posterior. That last property is why long-term investors who think in Bayesian terms tend, over decades, to converge on similar judgments about the same business even when they started in very different places.

2. The mechanism: why it works, and where it breaks

The deeper claim of Bayes’ rule is that there is, up to a choice of prior, exactly one consistent way to revise probability assignments in the light of new information. Frank Ramsey proved a version of this in 1926, and Bruno de Finetti in 1937. Their argument is sometimes called the Dutch-book theorem: if your beliefs do not obey the laws of probability, a counterparty can in principle construct a sequence of bets you would accept that guarantees you a loss. To be coherent in the face of uncertainty is, definitionally, to update like a Bayesian.

The rule works because it forces three pieces of intellectual hygiene that human cognition naturally resists. First, you must state your prior before you see the new evidence. Most investors form an opinion about a company, then read its quarterly report, then claim the report confirmed an opinion they had pre-loaded. Bayes’ rule does not permit this. Second, you must specify what the data would look like under each rival hypothesis, not only under your favoured one. A flat results print can mean the business is dying, or that management is investing for the next cycle, or that a one-off accounting item has masked underlying strength. The Bayesian asks which of those worlds best explains what you see, weighted by how likely the data are in each. Third, you must update by the right magnitude. Strong likelihood ratios produce large updates; weak ones produce small updates; and a piece of evidence whose probability is roughly equal under all hypotheses produces no update at all, however dramatic it appears.

Where the rule breaks is at the prior. Bayes himself was uneasy on this point; Price was uneasier; Laplace papered over it with his principle of insufficient reason. In practice the prior is the place where craft enters. Two investors looking at the same Indian cement company can have very different priors about the trajectory of its return on capital because one has lived through the 1995 to 2003 cycle and one has not. Neither prior is wrong; they are conditioned on different lifetimes of data. What Bayes’ rule guarantees is only that, if both update honestly on the next ten years of evidence, their posteriors will move toward each other.

3. The empirical record

For most of two centuries Bayes’ rule sat in the corner of probability theory while the Neyman-Pearson frequentist school dominated statistics. The revival began with three strands of empirical evidence that frequentist methods were leaving systematic value on the table.

The first was in medical diagnosis. David Eddy, writing in the Journal of the American Medical Association in 1982, presented a now-famous problem to a group of physicians. A woman has a positive mammogram. The base-rate prevalence of breast cancer in her age group is roughly 1 per cent. The mammogram has a sensitivity of 80 per cent and a false-positive rate of about 10 per cent. What is the probability she has cancer? The correct Bayesian answer is around 7.5 per cent. The median answer from the physicians was 75 per cent. Gerd Gigerenzer and Ulrich Hoffrage replicated the result in 1995 across a larger sample of clinicians: most professionals confronted with a screening problem ignore the base rate almost entirely and read the positive test result as if it were the posterior, not the likelihood. The cost of this error, scaled across an entire health system, is measurable in tens of billions of dollars and thousands of unnecessary procedures every year.

Decision tree decomposition of the Eddy 1982 mammogram problem showing the 7.5 percent posterior of cancer given a positive test.
Figure 1. The Eddy (1982) mammogram problem decomposed. Of 10,000 women screened, 80 true positives and 990 false positives — posterior = 7.5%.

The second strand is the IARPA Good Judgment Project, which ran from 2011 to 2015 under Philip Tetlock and Barbara Mellers. The project recruited several thousand volunteer forecasters to predict geopolitical and economic events: would Greece leave the Eurozone in the next year, would the Chinese exchange rate move outside a band, would a particular regime survive a coup attempt. Forecasters were scored using the Brier score, a proper rule that rewards both correctness and calibration. The top-performing 2 per cent, whom Tetlock labelled superforecasters, were not domain experts; they were Bayesian updaters. They wrote down explicit numerical priors, defined the events under which they would update, and moved their probability estimates in small increments — often by single percentage points — as new evidence arrived. Over four years they beat the average forecaster by 30 per cent and the average intelligence-community analyst by a margin that was politically embarrassing to publish.

The third strand is the academic re-examination of professional security analysts. Werner De Bondt and Richard Thaler, in Journal of Finance 1990, documented that sell-side analysts systematically over-react to recent earnings news — they update too far on weak evidence — and under-react to long-running shifts in fundamentals — they update too little on strong evidence. Subsequent work by Easterwood and Nutt in 1999 confirmed the pattern across multiple decades and markets. The error is not random; it is the exact opposite of what Bayes’ rule prescribes. Likelihood ratios that should produce a small movement produce a large one, and likelihood ratios that should be decisive produce almost no change.

4. Two historical episodes

The first is Bletchley Park, 1941 to 1945. Alan Turing arrived at the British codebreaking centre in September 1939 and within two years had built, with the help of the statistician I. J. Good, a Bayesian apparatus for breaking the daily key of the German naval Enigma. Their method, which Good later described in his 1979 paper Studies in the History of Probability and Statistics, was to maintain a running posterior on each candidate wheel setting and to update it message by message using the log-likelihood ratio between the candidate setting and a random one. The unit of evidence they used, the ban and the deciban, was simply log base ten of a likelihood ratio. Turing chose decibans because he had calibrated, through his own experience, that the human mind could meaningfully distinguish posterior odds at roughly that resolution. The system worked. From 1942 onward the British were reading German U-boat traffic in close to real time, and the Battle of the Atlantic turned. Sharon Bertsch McGrayne, in The Theory That Would Not Die (Yale 2011), estimates that the codebreaking shortened the war by two to four years and that the entire enterprise rested on Bayes’ rule applied with discipline.

The second is the search for the lost American hydrogen bomb off Palomares, Spain, in January 1966. A B-52 had collided with a refuelling tanker; four bombs fell, three on land, one into the Mediterranean. The US Navy needed to find it before the Soviets did. Dr John Craven, the Navy’s chief scientist on the deep-submergence program, assembled a panel of submarine commanders, weapons experts and oceanographers and asked each to construct a probability map for where the bomb had landed, conditional on what they knew about the aircraft’s trajectory, currents and impact dynamics. He then combined these priors using Bayes’ rule into a single posterior map and directed the search ships accordingly. As each grid square was searched and came up empty, he updated the map again, redistributing probability into the unsearched cells. The bomb was found, eighty days after impact, in a square that the consensus prior would have ranked low but that the Bayesian update had elevated to high posterior after several other squares came up clean. Craven repeated the method in 1968 to locate the lost submarine USS Scorpion, this time with even less data and even greater success. Both episodes are documented in Craven’s memoir The Silent War (Simon & Schuster, 2001) and in the McGrayne history cited above. They are the clearest demonstrations on record of how a properly applied Bayesian framework outperforms expert intuition on problems where the prior is uncertain and the evidence trickles in.

Line chart of two analysts updating priors from 80 percent and 20 percent toward a 60 percent truth over ten years.
Figure 2. Two analysts, two priors, ten years of disclosure. Honest Bayesian updating drags both posteriors toward the underlying truth.

5. Application to long-term equity investing — three concrete disciplines

The first discipline is the written prior. Before reading a company’s annual report, the Bayesian investor writes down a numerical estimate of the probability that the business will earn a stated minimum return on capital over the next five years. The estimate is conditioned on what is already known: the industry’s long-run economics, the company’s reinvestment history, the calibre of its capital allocator, the regulatory regime. The number is not an idle guess; it is the prior against which every new disclosure will be weighed. If the prior is 60 per cent and the half-year results would have been roughly equally likely whether the underlying probability were 60 per cent or 50 per cent, the rational update is small. If the results contain a piece of information that is far more likely under the 60 per cent world than under the 50 per cent one — a structural margin expansion, say, that no competitor has matched — the update is large. Without the written prior there is nothing to update from, and the investor falls back on the recency-weighted heuristics that the De Bondt-Thaler studies show to be biased.

The second discipline is the likelihood-ratio table. For each significant operating metric the investor maintains a small table: what would the metric look like in a world where the business is genuinely improving; what would it look like in a world where management is engineering an appearance of improvement; what would it look like in a world where the underlying franchise is decaying. The same printed number — say, a 200 basis-point uptick in operating margin — has very different implications in each world. The investor’s job is not to debate whether the number is good or bad but to ask which of the three worlds best explains it, and to update accordingly. Michael Mauboussin, in More Than You Know (Columbia 2006), calls this thinking in expected value across scenarios rather than around a single point estimate; the structure is identical to Bayes’ rule applied scenario by scenario.

A three-by-three likelihood-ratio table mapping a quarterly print onto three rival hypotheses about a business.
Figure 3. The likelihood-ratio table. Each row is a rival hypothesis; each column is what the print would look like under each.

The third discipline is small steps. The Tetlock superforecasters did not change their estimates from 60 per cent to 20 per cent in a single move. They moved from 60 to 57 to 55 to 52, taking each piece of evidence at its true informational weight. The same applies in equity investing. A long-term holding worth holding at a 60 per cent posterior of meeting one’s hurdle is rarely worth selling outright on a single quarterly disappointment; it is worth re-marking the posterior downward by a few points and re-examining whether that change crosses any decision threshold. The opposite error — wholesale conviction reversal on a single data point — is exactly what Easterwood and Nutt found analysts doing, and exactly what the long-tail of equity returns demonstrated by Hendrik Bessembinder punishes most severely. The handful of stocks that produce the bulk of decade-long returns rarely advertise themselves with clean linear progress; they are noisy on the way to greatness, and a Bayesian who updates in small steps survives the noise.

6. How the long-term equity tradition has used it

Charles T. Munger, in his 1995 Harvard Law lecture The Psychology of Human Misjudgement, lists “the absence of an elementary probability calculation” as among the leading causes of investing failure. Five years later, in his 2000 commencement address at the USC Law School, he was more specific: “If you don’t get this elementary, but mildly unnatural, mathematics of elementary probability into your repertoire, then you go through a long life like a one-legged man in an ass-kicking contest. You’re giving a huge advantage to everybody else.” The probability calculation he had in mind was, in essence, Bayes’ rule: the requirement to combine a prior with a likelihood instead of reading new evidence as if it were itself the posterior.

Howard Marks has built much of his writing at Oaktree around the same idea, without always using the Bayesian vocabulary. In his memo of January 2014, Dare to Be Great II, he writes that the second-level investor is the one who asks not what the news means but how the news will change the consensus probability assignment to a range of outcomes — and how that change should differ from his own update. The arbitrage opportunity, in Marks’ framework, is the gap between the market’s likelihood ratio and one’s own better-calibrated one. The discipline that holds the framework together is the requirement to do the calculation explicitly. In The Most Important Thing (Columbia 2011) he devotes a chapter to the difference between knowing the range of outcomes, knowing their probabilities, and knowing how to update both as the world unfolds. That sequence — range, probability, update — is the Bayesian sequence stated in plain English.

Among practising portfolio managers the explicit case is Bill Miller, who ran the Legg Mason Value Trust from 1990 to 2012. Miller used a Bayesian decision framework — he had brought Mauboussin into the firm to build it — and the framework’s mathematics is what allowed him to hold positions in Amazon and Dell through drawdowns that conventional analysts treated as decisive evidence of impairment. Miller’s view was that the drawdowns were exactly the noisy evidence Bayes’ rule prescribes a small update for, not the catastrophic news that warranted abandonment. The framework eventually broke in 2008, not because Bayes’ rule failed, but because Miller’s prior on US financial-sector capital adequacy was anchored on a regime that had ended. The lesson there is the one McGrayne emphasises: Bayes’ rule is only as good as the honesty with which the prior is constructed, and a prior that does not update its own structural assumptions when the structure itself changes is no defence.

7. Key takeaways

Bayes’ rule is the algebra of changing one’s mind. Five operational consequences follow for the long-term equity investor.

One. Write the prior down before the data arrive. An unwritten prior is, by the time the data are in, indistinguishable from a rationalisation.

Two. For every important metric, ask what the data would look like under each rival hypothesis, not only under the favoured one. The diagnostic value of evidence is the ratio of those probabilities, not the loudness of the number.

Three. Update in small steps. A coherent Bayesian almost never moves the posterior by more than ten percentage points on a single quarter’s print. The investors who do so are advertising that they had no prior to begin with.

Four. Convergence is a feature, not a bug. Two analysts with different priors who both update honestly will, given enough rounds of disclosure, end up close to each other. If your view is moving away from informed others’ views over time, the prior is probably anchored on a fact pattern that no longer holds.

Five. The prior is where the craft lives. The arithmetic of updating is mechanical; the choice of prior is judgment. Most of what experienced investors learn over decades is encoded in better priors, not in better updates. The error that ends most careers is a stale prior that refuses to be re-examined when the structural facts change underneath it.

Bayes’ rule will not tell anyone which company to own. It will, applied honestly, prevent the investor from being so easily moved by the latest piece of evidence that they are still being moved by it when the next, contradictory, piece arrives. In a profession whose hardest task is to do less in response to noise, that is a service whose value is hard to overstate.

— Manish Goel, FCA / NorthPath Advisory OÜ / Tallinn, Estonia

Important.
All content on this site and in this email is journalism and education for a general audience. Nothing here constitutes investment advice or a recommendation in respect of any specific financial instrument, nor an offer or solicitation to buy or sell any security. Readers should consult an authorised financial adviser regulated in their own jurisdiction before making any investment decision.