Test-Time Compute: The Discovery That Models Can Think Longer, Not Just Bigger

OpenAI’s o1 model, released in late 2024, introduced a new scaling dimension entirely. Rather than investing more compute in training, the model could invest more compute at inference time — “thinking” through problems step by step.

The Five Scaling Phases of AI — Animated Explainer

The Revolution

This was revolutionary because it decoupled capability from model size for the first time. The same model could produce quick, cheap answers for simple questions and expensive, thorough answers for complex ones.

The “thinking time” knob created a new scaling law: capability as a function of test-time compute, independent of parameter count.

The Reasoning Architecture

The model doesn’t just generate answers — it generates reasoning processes:

  • Chain-of-thought decomposition — breaking problems into intermediate steps
  • Self-verification — checking its own work before committing
  • Backtracking — recognizing when an approach isn’t working

The Economics Inversion

In Phases 1–3, the big expense was training. In Phase 4, inference cost dominates — because every hard query generates 10–100x more tokens as the model thinks. This inverts the economics: the winner isn’t who trains the biggest model, but who thinks most efficiently per dollar spent.

Phase 4 gave models the ability to reason. Phase 5 asks: if they can think through problems, can they also act on the solutions?

Read the full analysis on The Business Engineer →

Scroll to Top

Discover more from FourWeekMBA

Subscribe now to keep reading and get access to the full archive.

Continue reading

FourWeekMBA