The Power Law: Vilfredo Pareto's 1896 Curve of the Vital Few, and Why a Handful of Decisions Will Decide the Long-Term Investor's Record

Afternoon Edition — Mental Models · Essay No. 10 · 12 June 2026 · Tallinn

In 1896, in the second volume of his Cours d’économie politique, published at Lausanne, Vilfredo Pareto set out the results of a tedious exercise. He had collected income-tax tables from England, from several German states, from Paris, and from a scatter of Italian cities, and he had counted, for each income level, how many taxpayers earned more than that amount. When he plotted the logarithm of the count against the logarithm of the income, the points fell, again and again, on something close to a straight line. The implication was disquieting then and remains disquieting now: the number of people earning more than a given income falls as a fixed power of that income. There is no typical rich man. However far up the scale one looks, the same proportional thinning continues, and the top of the distribution holds a share of the total that no bell-curve intuition would ever grant it.

1. The model

A quantity follows a power law when the probability of exceeding a size x falls in proportion to x raised to a fixed negative exponent. Pareto wrote the relation for incomes as log N = log A − α log x, where N is the number of incomes above x, and he found the exponent α clustering around 1.5 across every country and century he examined (Pareto, Cours d’économie politique, vol. 2, F. Rouge, Lausanne, 1896). The exponent is the entire story. It measures how fast the tail thins, and therefore how much of the whole the largest few observations command. An exponent near 1.16 produces the famous result that the top 20 per cent hold 80 per cent of the total — the “80/20 rule,” a phrase that belongs not to Pareto but to the quality engineer Joseph Juran, who generalised the curve into a management principle, called its two regions “the vital few and the trivial many” (Quality Control Handbook, McGraw-Hill, 1951), and later confessed that he had named it after the wrong man (“The Non-Pareto Principle: Mea Culpa,” Quality Progress, 1975).

It is worth pausing on how directly the exponent translates into concentration, because the arithmetic is the working content of the model. For a Pareto tail with exponent α, the share of the total held by the top fraction p of observations is p raised to the power (α − 1)/α. At Pareto’s own α of 1.5, the top 20 per cent of earners take the cube root of 0.2 — about 58 per cent of all income. At α of 1.16, the same calculation yields 80 per cent, which is where Juran’s slogan comes from. And as α falls toward 1, the share of the top fraction rises toward everything: concentration is not an accident layered on top of the distribution but a direct reading of its slope. One number, estimated from a straight line on double-log paper, tells you how unequal the world it describes must be.

Chart showing the share of the total captured by the largest holders for tail exponents 1.16 and 1.5, with guides marking the 80/20 point and Pareto's 58 per cent — Figure 1. The exponent is the inequality: the share of the total captured by the top fraction of holders, for Pareto’s α of 1.5 and the 80/20 exponent of 1.16.

The model turned out to be far larger than incomes. George Kingsley Zipf showed that word frequencies in any language, and the populations of cities ranked by size, obey the same law with an exponent near one (Human Behavior and the Principle of Least Effort, Addison-Wesley, 1949). Benoit Mandelbrot found it in the changes of cotton prices, and drew the conclusion that finance would spend the next half-century resisting: that the variance of speculative price changes may be effectively unbounded (“The Variation of Certain Speculative Prices,” Journal of Business, vol. 36, 1963, pp. 394–419). The modern survey literature — M. E. J. Newman’s review in Contemporary Physics (vol. 46, 2005, pp. 323–351) is the standard reference — lists earthquake magnitudes, war casualties, citation counts, firm sizes, and moon-crater diameters among the law’s confirmed habitats.

The one-sentence form: in a power-law world the tail thins so slowly that the largest few observations dominate every sum — so averages mislead, “typical” does not exist, and almost everything turns on almost nothing.

2. The mechanism

Why should so many unrelated systems produce the same curve? The deep property is scale invariance. A power law is the only distribution that looks the same at every magnification: the ratio of those above 10 million to those above 1 million equals the ratio of those above 100 million to those above 10 million. Nothing in the system defines a natural unit, so no natural unit appears in the data. The bell curve, by contrast, is anchored to a characteristic scale — its mean — and punishes deviation from it at a ferocious, accelerating rate. The two distributions describe two different kinds of world: one in which observations are the sum of many small independent influences, and one in which observations feed on their own size.

That feeding is the generative mechanism. Robert Gibrat observed in 1931 that if firms grow each year by a random percentage of their current size — proportional growth — inequality compounds multiplicatively. Herbert Simon showed that when new entrants attach themselves to incumbents with probability proportional to incumbent size — the rich-get-richer process — the resulting distribution acquires a Pareto tail (“On a Class of Skew Distribution Functions,” Biometrika, vol. 42, 1955, pp. 425–440). Barabási and Albert rediscovered the same mathematics in the link structure of the web (“Emergence of Scaling in Random Networks,” Science, vol. 286, 1999, pp. 509–512). Wherever advantage compounds — capital earning returns, brands attracting customers, platforms attracting users — the machinery for a power law is present. Xavier Gabaix’s synthesis shows that proportional random growth, plus some friction such as entry and exit, generates Pareto tails almost inevitably (“Power Laws in Economics: An Introduction,” Journal of Economic Perspectives, vol. 30, no. 1, 2016, pp. 185–206).

Readers of the previous essays in this series will notice the kinship. Multiplicative growth without friction produces the log-normal distribution that Galton and McAlister derived in 1879 and that this letter examined last week. The power law is the log-normal’s wilder cousin: the same compounding engine, run with reinforcement or run long enough against a barrier, stretches the tail from heavy to scale-free. And the consequences for the statistician are severe. When the exponent α lies below 2 — Pareto’s incomes, at 1.5, qualify — the variance of the distribution is infinite: the sample variance never converges, however much data arrives. When α lies below 1, even the mean is infinite. The law of large numbers, the central limit theorem, and the standard deviation — the apparatus this series has spent five essays assembling — all weaken or fail outright in the regime where the power law rules. The model is not a curiosity. It is a boundary marker showing where ordinary statistics stops working.

A boundary marker, it should be said, and not a wrecking ball. The discipline of this series is to state what a model does not claim, and the power law does not claim the bell curve is useless. Where outcomes are genuinely additive — the aggregation of many small, independent, bounded influences — the central limit theorem holds and Gaussian tools work as advertised: measurement errors, sampling noise, the diversified average of many uncorrelated small bets. The mental model is therefore a sorting question to ask before any statistical tool is lifted from the drawer: is this quantity built by addition, or by compounding and reinforcement? Heights are added; fortunes compound. Polling errors are added; market panics reinforce. The investor who asks the sorting question first will use each statistic where it belongs — and will notice that almost everything that matters to a long-term equity record sits on the compounding side of the line.

3. The empirical record

The cleanest single demonstration in economics is Robert Axtell’s census of American business. Using Internal Revenue Service data covering essentially the entire population of United States firms — about 5.5 million of them in 1997 — Axtell showed that firm sizes follow the Zipf form of the power law with an exponent of 1.059, a fit that holds over eight orders of magnitude, from the corner shop to the largest corporations (“Zipf Distribution of U.S. Firm Sizes,” Science, vol. 293, 2001, pp. 1818–1820). An exponent that close to one sits at the edge of the regime in which the mean itself diverges: the American corporate landscape is about as concentrated as a distribution can be while still possessing a finite average.

Security prices supply the record most relevant to this letter. A research programme in statistical physics, analysing tens of millions of price observations across markets and decades, established what is now called the inverse cubic law: the probability of a daily return larger than x falls off as x to the power of approximately minus three, a regularity stable across stocks, indices, countries, and time periods (Gopikrishnan, Plerou, Amaral, Meyer and Stanley, “Scaling of the Distribution of Fluctuations of Financial Market Indices,” Physical Review E, vol. 60, 1999, pp. 5305–5316). An exponent of three is fat enough to matter enormously: it implies that returns possess a finite variance but wildly misbehaving higher moments, and that extreme days arrive orders of magnitude more often than any Gaussian calibration predicts.

Log-log chart comparing Gaussian and power-law probabilities of extreme daily market moves; the Gaussian curve plunges off the chart before seven sigma while the power law declines gently, with a marker at 19 October 1987 — Figure 2. Two beliefs about extreme days: a Gaussian calibrated to daily volatility runs out of probability before seven sigma; the inverse cubic law keeps the tail alive — and 19 October 1987 happened.

Honesty requires one caveat. Clauset, Shalizi and Newman, in the paper that brought statistical rigour to the field, re-tested hundreds of claimed power laws and found that many are better described as log-normals or as power laws with cut-offs (“Power-Law Distributions in Empirical Data,” SIAM Review, vol. 51, 2009, pp. 661–703). For the investor the dispute is largely academic. Whether the far tail of returns is exactly Pareto or merely log-normal-with-a-heavy-tail, every serious estimate agrees on the operative fact: the Gaussian benchmark understates the frequency of extremes not by percentages but by powers of ten.

4. Two episodes

Black Monday, 19 October 1987. On a single trading day, with no war, no default, and no recession announcement, the Dow Jones Industrial Average fell 22.6 per cent and the S&P 500 fell 20.5 per cent. Under the Gaussian assumptions then standard in portfolio theory — daily volatility in the region of one per cent — a twenty-standard-deviation day is not “unlikely”; it is excluded. Jackwerth and Rubinstein later computed that, under the lognormal benchmark used to price options before the crash, the fall in two-month S&P 500 futures was a twenty-seven standard-deviation event, with probability of order 10 to the power of minus 160 (“Recovering Probability Distributions from Option Prices,” Journal of Finance, vol. 51, 1996, pp. 1611–1631). A number that small is not a probability; it is a verdict on the model that produced it. The Brady Commission’s post-mortem identified the amplifier: portfolio insurance programmes sold mechanically into a falling market, size feeding on size — the preferential-attachment machinery of Section 2 operating in reverse (Report of the Presidential Task Force on Market Mechanisms, January 1988). Markets did not violate statistics that day. They violated the wrong distribution.

Long-Term Capital Management, 1998. LTCM began 1998 with about 4.7 billion dollars of capital, a balance sheet of roughly 125 billion dollars, and a risk system built on recent correlations and near-Gaussian tails. Its convergence trades were, in distributional terms, a single position: short the tail. When Russia defaulted in August 1998, spreads that the models treated as independent widened together; the fund lost 553 million dollars on 21 August alone, and by late September its capital had fallen to about 400 million dollars, at which point the Federal Reserve Bank of New York coordinated a 3.6 billion dollar recapitalisation by a consortium of fourteen institutions to prevent a disorderly liquidation (President’s Working Group on Financial Markets, Hedge Funds, Leverage, and the Lessons of Long-Term Capital Management, April 1999; Roger Lowenstein, When Genius Failed, Random House, 2000). The partners described the August moves as multi-sigma events arriving day after day. The better description is simpler: the sigmas were computed from the wrong curve. In a fat-tailed regime, leverage does not merely amplify outcomes — it converts the tail that will eventually arrive into ruin that cannot be survived.

Timeline of power-law milestones: Pareto 1896, Zipf 1949, Mandelbrot 1963, Black Monday 1987, LTCM 1998, Axtell 2001, Clauset Shalizi and Newman 2009 — Figure 3. The power law’s long discovery: from Lausanne tax tables to rigorous tail statistics, with the market’s two tuition payments in between.

5. Application to long-term equity investing

The power law enters the investor’s life from two directions at once: the distribution of outcomes across the market, and the distribution of outcomes inside the portfolio. Three operating disciplines follow.

First: size every commitment for the tail, not for the average. The inverse cubic law is a planning parameter, not trivia. A risk framework calibrated to recent standard deviations will, over a long investing life, meet several days it classified as impossible. The disciplines that survive such days are old-fashioned: leverage low enough that a 20 per cent gap cannot force liquidation, obligations matched so that nothing must be sold into the fall, and a margin of safety in the purchase price that does the work volatility models pretend to do. The 1987 and 1998 episodes differ in every detail except the one that matters — in both, the fatal step was sizing positions as if the tail were Gaussian.

Second: never truncate the right tail. If a handful of holdings will generate most of a lifetime’s return — and the evidence of this series’ earlier essay on Hendrik Bessembinder’s findings says exactly that about equity markets at large — then the costliest error available to a disciplined, sensible, profit-taking investor is the early sale of the eventual outlier. Selling a holding after it doubles feels like prudence; in a power-law world it is the systematic amputation of the only outcomes that pay for everything else. The arithmetic is unforgiving: the investor who repeatedly banks 30 per cent gains has placed a ceiling precisely where the distribution places its engine. The same logic complicates mechanical rebalancing. Trimming every position back to a fixed weight is, in distributional terms, a standing order to sell the right tail in instalments; whatever its merits for risk control, the long-term investor should at least see the trade clearly — it exchanges the source of power-law upside for a smoother ride — and should ask whether the business itself, rather than the share price, has given any reason to shrink the commitment.

Third: allocate effort by the vital few. Juran’s rule applies to the research process itself. Most annual reports read, most meetings taken, most news consumed will prove to be the trivial many; a small number of decisions will prove to be the vital few. The practical consequences are concentration of attention on businesses that own a compounding mechanism — scale economies, network effects, reinvestment at high incremental returns, the preferential-attachment engines of Section 2 — and a deliberate scarcity of action. Warren Buffett’s old teaching device, recounted in Charlie Munger’s 1994 University of Southern California talk, of a punch card with twenty lifetime slots, is power-law reasoning compressed into an image: if only a few decisions will matter, the rational response is to make few decisions, and to make them matter.

6. How the long-term equity tradition has used it

The investors this letter studies discovered the power law empirically, in their own brokerage records, long before reading the statistics. Warren Buffett devoted a section of his 2022 letter to shareholders — he titled it “The Secret Sauce” — to the anatomy of Berkshire Hathaway’s 58-year record, writing that the results were “the product of about a dozen truly good decisions … about one every five years,” and that over time “the weeds wither away in significance as the flowers bloom” (Berkshire Hathaway shareholder letter, 2022). A dozen decisions across six decades, carrying a compounded record against thousands of candidates considered: that is an exponent speaking.

Charlie Munger made the same count at USC in 1994: “If you took our top fifteen decisions out, we’d have a pretty average record” (“A Lesson on Elementary, Worldly Wisdom,” University of Southern California, 1994; reprinted in Poor Charlie’s Almanack, ed. Peter Kaufman, 2005). The remark is usually quoted as a curiosity. It is better read as a portfolio-construction theorem: when returns are power-law distributed, patience and inactivity are not temperamental virtues but mathematically indicated strategies.

Nicholas Sleep closed the Nomad Investment Partnership in 2014 having drawn the full practical conclusion. His letters to partners return repeatedly to the observation that a tiny number of businesses — those that share scale economies with customers and reinvest relentlessly — would do the heavy lifting of the partnership’s record, and that the gravest error open to the partners was selling them on valuation grounds along the way (Nomad Investment Partnership Letters to Partners, 2001–2014, IGY Foundation compilation, 2021). Nomad’s terminal act — concentrating into a few compounding machines and stopping — is the purest expression in the modern record of taking the tail seriously.

7. Key takeaways

A power law has no typical member: the probability of exceeding a size falls as a fixed power of that size, so the largest few observations dominate every total. Pareto found it in incomes in 1896; it has since been confirmed in firm sizes, city populations, word frequencies, and market returns.
The mechanism is compounding with reinforcement. Wherever advantage feeds on itself — proportional growth, preferential attachment, reinvested returns — the vital few will eventually carry the sum, and bell-curve statistics will eventually fail.
Equity returns have tails near exponent three: finite variance, but extremes arriving orders of magnitude more often than Gaussian models predict. October 1987 and LTCM are not anomalies of history; they are samples from the actual distribution.
For the portfolio, the law cuts both ways: size and leverage must survive the left tail, and selling discipline must never truncate the right one. The early sale of the eventual outlier is the most expensive mistake a careful investor can make.
For the process, Juran’s vital few govern: a lifetime record will rest on a dozen decisions. Make few, make them large in attention, and let the flowers bloom.

— Manish Goel, FCA / NorthPath Advisory OÜ / Tallinn, Estonia

Important.
All content on this site and in this email is journalism and education for a general audience. Nothing here constitutes investment advice or a recommendation in respect of any specific financial instrument, nor an offer or solicitation to buy or sell any security. Readers should consult an authorised financial adviser regulated in their own jurisdiction before making any investment decision.

The Power Law: Vilfredo Pareto’s 1896 Curve of the Vital Few, and Why a Handful of Decisions Will Decide the Long-Term Investor’s Record

1. The model

2. The mechanism

3. The empirical record

4. Two episodes

5. Application to long-term equity investing

6. How the long-term equity tradition has used it

7. Key takeaways

More posts

Markov Chains: Why the Future Depends Only on the Present

The Wrong Question: Attribute Substitution and the Good-Company Trap

The Monte Carlo Method: Stanisław Ulam’s 1946 Game of Solitaire, the Engine of Chance That Modelled the Bomb, and Why the Long-Term Investor Should Think in Distributions, Not Forecasts

The Second Marshmallow: Walter Mischel’s 1968 Test of Self-Control, and Why the Market Pays Its Largest Rewards to the Long-Term Investor Who Can Wait