The Second Marshmallow: Walter Mischel’s 1968 Test of Self-Control, and Why the Market Pays Its Largest Rewards to the Long-Term Investor Who Can Wait

np cover delay gratification

The NorthPath Letter · Behavioural Finance · Afternoon Edition

In 1968, in a converted bungalow on the Stanford University campus that housed the Bing Nursery School, a psychologist named Walter Mischel began offering four-year-olds a small and, to them, agonising bargain. A child was seated at a bare table and shown a single marshmallow. The experimenter explained that he had to step out of the room for a while. The child could ring a bell at any moment and the experimenter would return and the child could eat the one marshmallow at once. Or the child could wait, without ringing, until the experimenter came back of his own accord, in which case the reward would be two marshmallows instead of one. Then the adult left, and a hidden camera recorded what is now among the most reproduced images in the whole of psychology: a small person squirming, covering their eyes, talking to themselves, turning their back on the plate, and very often, after a brave interval, surrendering and ringing the bell.

The experiment looked like a study of children. It was, in fact, a study of the most expensive decision an investor ever makes. The marshmallow on the table is the certain, immediate, vivid reward, the realised gain you can take today by selling, the dopamine of action. The second marshmallow, promised but unseen, is the larger reward that exists only for those who can sit still through discomfort. Mischel had built, in miniature, the exact apparatus that the equity market runs on every trading day, and the children, like the market’s participants, divided into those who could wait and those who could not.

The bias: a single self at war with its own future

The formal name for what Mischel was measuring is delay of gratification: the capacity to forgo a smaller, sooner reward in favour of a larger, later one. The deficit it exposes is not stupidity. The children who rang the bell understood perfectly well that two marshmallows were better than one. The problem was that the value of the immediate reward, present and tangible in front of them, swamped the value of the larger reward that existed only as a promise. This is the cognitive signature of the bias: a systematic, predictable preference reversal in which the nearer prize looms larger than its true worth simply because it is near.

The canonical statement of the finding is Walter Mischel, Yuichi Shoda and Monica Rodriguez, “Delay of Gratification in Children,” published in Science in 1989 (volume 244, pages 933 to 938). The paradigm itself had been developed two decades earlier in work such as Mischel and Ebbesen’s 1970 study of attention in delay of gratification, but the 1989 paper made the claim that turned a nursery game into a cultural touchstone. Following the original Stanford preschoolers into adolescence, the authors reported that the seconds a four-year-old had been able to wait predicted, years later, measurable differences in adolescent competence: better scholastic performance, including higher college-entrance test scores in the companion work by Shoda, Mischel and Peake, and a greater ability to cope with frustration and stress. A trait observable before a child could read appeared to forecast outcomes a decade and a half away.

For the investor the translation is direct and uncomfortable. The market does not reward intelligence in proportion to its supply, nor does it reward effort, of which there is always a glut. It reserves its largest payments for the capacity to leave a sound decision undisturbed while it compounds, which is delay of gratification wearing a pinstripe suit. The discount rate the impatient investor applies to his own future is not the one printed in the textbook. It is steeper, it is unstable, and it bends sharply upward whenever a reward comes within reach.

The mechanism: the hot system and the cool system

Why should a promised doubling of the reward fail to hold a four-year-old, or a tripling fail to hold a grown investor with a spreadsheet? Mischel’s own answer, developed over decades and crystallised with Janet Metcalfe in their 1999 Psychological Review paper, was that the mind runs two systems that compete for control of behaviour. There is a “hot” system, emotional, reflexive, fast, organised around immediate stimuli, and a “cool” system, reflective, slow, cognitive, capable of representing the future. The marshmallow on the table activates the hot system directly; its sweetness, its proximity, its sheer presence generate an impulse to consume that the cool system must override. The decisive variable, Mischel found, was not willpower in the sense of grim endurance. It was attention. Children who succeeded did not stare down the marshmallow and tough it out. They looked away. They turned around. They reimagined the marshmallow as a cloud or a picture rather than a food. They converted a hot stimulus into a cool representation, and in doing so they removed the trigger rather than resisting it.

The economic profession reached the same destination by a different road. The hot-cool conflict is the psychological flesh on the bones of what economists call present bias, formalised in Robert Strotz’s 1956 work on dynamic inconsistency and David Laibson’s 1997 model of hyperbolic discounting: the tendency to discount the very near future far more steeply than the merely distant future, so that preferences expressed calmly at a distance reverse violently up close. The investor who resolves on Sunday to hold for ten years, and sells on Wednesday because the screen is red, has not changed his analysis. He has simply moved from the cool system to the hot one, from distance to proximity, and his discount rate has lurched with him.

The investing environment is engineered, with increasing sophistication, to keep the hot system in charge. A brokerage application that flashes a price in real time, colours losses in red, buzzes the phone on a move and offers commission-free execution at the speed of a thought has, whether by design or by accident, recreated Mischel’s table and placed the marshmallow permanently in front of the investor’s eyes. The cool system never gets a quiet room in which to work.

The empirical record: a celebrated finding, honestly qualified

An evidence-based letter owes its readers the awkward part of the story, because the marshmallow test is also a case study in how a striking result can be oversold. In 2018 Tyler Watts, Greg Duncan and Haonan Quan published a conceptual replication in Psychological Science (volume 29, pages 1159 to 1177) using a far larger and more representative sample than Mischel’s original few dozen Stanford children. Their verdict was a careful deflation rather than a refutation. The association between early delay ability and later achievement survived, but it was roughly half the size reported in the original studies, and it shrank by about two-thirds once the researchers controlled for family background, the child’s early cognitive ability and the home environment. An extra minute of waiting at age four was associated with only about a tenth of a standard deviation of additional achievement at fifteen, and even that faded under controls. A direct comparison by Falk, Kosse and Pinger, also in Psychological Science in 2020, reached a similar nuanced reading.

The honest lesson is not that self-control is a myth. It is that the capacity to wait is heavily entangled with circumstance. A child from a stable, resourced home learns to wait partly because, in that child’s experience, promises are kept and later rewards actually arrive. A child for whom the future is unreliable is making a rational bet by taking the sure marshmallow now. Transplanted to markets, this is the deepest point in the whole literature: patience is not a free-floating virtue but a learned expectation that the future will honour its commitments. The investor’s task is therefore twofold, to cultivate the trait and to build a portfolio whose “second marshmallow” is reliable enough to be worth waiting for. Waiting patiently for a business that is quietly deteriorating is not discipline; it is delay of gratification misapplied to a broken promise.

Bar chart: original marshmallow-test effect versus the smaller 2018 replication effect
Figure 1. The original 1989-90 result halved in the 2018 replication and shrank by about two-thirds once family background and early cognitive ability were controlled for (Watts, Duncan & Quan, Psychological Science 29, 2018).

That the trait, properly measured, matters for money is visible in the aggregate behaviour of investors. The annual Quantitative Analysis of Investor Behavior produced by the research firm DALBAR has documented for three decades a persistent gap between the returns the market delivers and the returns the average fund investor actually earns, a gap created almost entirely by buying and selling at the wrong moments. In its 2024 study the average equity fund investor earned several percentage points less than the index, and the firm has repeatedly noted that the typical investor does not stay invested long enough, holding a given fund for only a few years, to receive what patience would have paid. The shortfall is the marshmallow test scored in basis points: the cost, summed across millions of bell-ringers, of taking the sooner reward.

Two historical episodes

The first episode is the day-trading mania that accompanied the dot-com bubble of 1999 and 2000. Discount brokerages and the first generation of fast internet execution did to a whole population of investors what Mischel’s table did to one child at a time: they put the marshmallow within instant reach and removed every barrier to grabbing it. Trading that had once required a phone call and a salaried broker now took a click. Average holding periods, which had been measured in years for much of the twentieth century, collapsed toward months and, for the most active accounts, toward days. The systematic study of the period, Brad Barber and Terrance Odean’s work on the records of a large discount broker, reached a conclusion that reads like the adult sequel to the Stanford films: the investors who traded the most earned the least, underperforming the patient by a wide margin, with their returns consumed by the costs and the mistakes of constant action. When the index peaked in March 2000 and then fell by roughly half over the following two years, it was the impatient, fully invested and frequently trading, who were positioned to be hurt the most.

The second episode is its twenty-first-century reprise. In 2020 and 2021 a new cohort of investors arrived through commission-free, gamified mobile applications whose entire design philosophy was the suppression of delay, confetti animations on execution, instant fractional purchases, push notifications timed to provoke. In January 2021 the shares of a struggling video-game retailer rose roughly twentyfold in weeks on a wave of coordinated enthusiasm, and the same names that soared on the way up devastated the accounts of those who bought the vivid, sooner reward at its peak and could not let go. The instrument had changed from a trading terminal to a smartphone; the marshmallow on the table had not. Both episodes make the same point: when an environment engineers away the friction that protects the cool system, the population’s discount rate rises, holding periods shorten, and wealth transfers from those who wait to those who cannot.

The counter-measure framework: three disciplines of waiting

Mischel’s most useful discovery, for the investor, was that the successful children did not win by force of will. They won by structuring the situation so that less will was required. That is the template for every practical defence against this bias. The point is not to become a person of heroic self-denial but to arrange one’s affairs so that heroism is unnecessary.

The first discipline is to remove the stimulus rather than resist it. The Stanford children who covered their eyes were practising what an investor practises by checking prices quarterly instead of by the minute, by turning off the push notifications, by writing the investment thesis down at the moment of purchase so that the later decision can be made by the cool system that wrote it rather than the hot system that is panicking. The aim is to keep the marshmallow out of sight, because the research is unambiguous that attention, not endurance, is the operative lever. An investor who cannot see the flashing price cannot be provoked by it.

Diagram contrasting the impulsive hot system with the reflective cool system
Figure 2. Mischel and Metcalfe’s hot/cool framework: the children who waited longest did not resist the marshmallow, they redirected attention away from it, converting a hot stimulus into a cool representation.

The second discipline is to pre-commit while cool to bind oneself while hot. The reason cooling-off periods exist in law is precisely that regulators understand preference reversal as a structural feature of human decision-making, not a personal failing. Under the United Kingdom’s Financial Conduct Authority rules, the Conduct of Business Sourcebook grants retail buyers of many investment products a cancellation right of fourteen, and for some life and pension contracts thirty, calendar days, a mandated pause inserted between the hot decision and its irreversible consequence. The European Union enshrines the same principle: the Distance Marketing of Financial Services regime, from Directive 2002/65/EC through its 2023 successor, gives consumers a fourteen-day right of withdrawal from financial contracts concluded at a distance, with an online “withdrawal function” mandated across the bloc from June 2026. Two regulators on two different legal systems have independently concluded that the most reliable protection against a hot decision is a structurally enforced delay. The investor can build the same architecture privately: a personal rule that no position is sold within a fixed number of days of the impulse to sell it, a written sell discipline agreed in advance, an automatic monthly purchase that takes the timing decision out of the hot system’s hands entirely.

The third discipline is to make the second marshmallow vivid. The children who waited longest were those who could hold the larger, later reward in mind. The investor’s analogue is to keep the destination concrete: the arithmetic of compounding written out in actual currency, the historical record of how a sound business multiplied over a decade for those who did nothing, the explicit cost of the behaviour gap. When the future reward is as psychologically present as the immediate temptation, the cool system has something to fight with. Patience is far easier to sustain when one can see, in numbers, exactly what one is being patient for.

How long-term-equity practitioners addressed it

The discipline that academic psychology measured with marshmallows, the great long-term investors arrived at by temperament and codified into method. Warren Buffett built an entire investment philosophy on the refusal to ring the bell. “Our favorite holding period is forever,” he wrote to Berkshire Hathaway’s shareholders in 1988, and in the 1990 letter he made the temperament explicit: “Lethargy bordering on sloth remains the cornerstone of our investment style.” He has described the stock market as a mechanism for transferring money from the impatient to the patient, which is the marshmallow test stated as a wealth-transfer identity. Buffett’s genius was never primarily analytical horsepower, of which many of his rivals had as much; it was the constitutional capacity to sit, for years, on a decision already made, to let the second marshmallow grow while others ate the first.

His partner Charlie Munger reduced the same insight to a sentence that belongs on every trading screen: “The big money is not in the buying and the selling, but in the waiting.” Munger’s framing matters because it relocates the source of returns. The market’s culture glorifies the buy and the sell, the moments of action, the decisions that feel like skill. Munger insisted that the decisive contribution, the part that actually creates wealth, happens in the long inert stretches when the disciplined investor does nothing at all, the very behaviour the impatient experience as unbearable. A generation earlier Thomas Phelps had argued the same case in his 1972 book 100 to 1 in the Stock Market, marshalling examples of ordinary businesses that multiplied a hundredfold for owners who simply held on, and observing that the investor’s worst enemy was the urge to do something. Three practitioners, across three generations, converged on a single instruction that Mischel’s preschoolers would have recognised: the reward goes to the one who can leave the marshmallow alone.

Two statistics: the average-investor behaviour gap and the collapse in holding periods
Figure 3. The behaviour gap (the average equity-fund investor trailed the index by several points in 2024, per DALBAR) and the long collapse in holding periods are the marshmallow test scored in basis points.

Key takeaways

  • The market pays for patience more reliably than for intelligence. Delay of gratification, the capacity to forgo a smaller-sooner reward for a larger-later one, is the single trait Mischel’s 1989 study linked to long-run success, and it is the trait the equity market compensates most consistently.
  • The failure is attentional, not moral. Impatience is a hot, reflexive system overriding a cool, reflective one. The cure is to remove the stimulus, not to summon willpower, exactly as the children who waited longest looked away from the marshmallow rather than staring it down.
  • Honour the honest caveat. The 2018 replication by Watts, Duncan and Quan halved the original effect and shrank it further under controls. Patience pays only when the second marshmallow is real; waiting on a deteriorating business is not discipline but error.
  • Engineer delay before you need it. Regulators mandate cooling-off periods, the FCA’s fourteen-to-thirty-day cancellation right and the EU’s distance-marketing right of withdrawal, precisely because structured pauses beat in-the-moment resolve. Build the same architecture into your own process.
  • The masters did nothing, on purpose. Buffett’s “lethargy bordering on sloth” and Munger’s “the big money is in the waiting” describe the same edge from the inside: most of the return is earned in the long stretches when the disciplined investor leaves a sound decision alone.

— Manish Goel, FCA / NorthPath Advisory OÜ / Tallinn, Estonia

Important.
All content on this site and in this email is journalism and education for a general audience. Nothing here constitutes investment advice or a recommendation in respect of any specific financial instrument, nor an offer or solicitation to buy or sell any security. Readers should consult an authorised financial adviser regulated in their own jurisdiction before making any investment decision.