OpenAI’s o1 model, released in late 2024, introduced a new scaling dimension entirely. Rather than investing more compute in training, the model could invest more compute at inference time — “thinking” through problems step by step.
The Revolution
This was revolutionary because it decoupled capability from model size for the first time. The same model could produce quick, cheap answers for simple questions and expensive, thorough answers for complex ones.
The “thinking time” knob created a new scaling law: capability as a function of test-time compute, independent of parameter count.
The Reasoning Architecture
The model doesn’t just generate answers — it generates reasoning processes:
- Chain-of-thought decomposition — breaking problems into intermediate steps
- Self-verification — checking its own work before committing
- Backtracking — recognizing when an approach isn’t working
The Economics Inversion
In Phases 1–3, the big expense was training. In Phase 4, inference cost dominates — because every hard query generates 10–100x more tokens as the model thinks. This inverts the economics: the winner isn’t who trains the biggest model, but who thinks most efficiently per dollar spent.
Phase 4 gave models the ability to reason. Phase 5 asks: if they can think through problems, can they also act on the solutions?









