
Jensen’s opening framework — “AI Scales Beyond LLMs” — identified three distinct scaling laws, each requiring enormous compute. There is no plateau in sight.
Scaling Law #1: Pre-Training (2015-2022)
The original scaling law that drove the initial AI revolution.
BERT, Transformers, GPT series — teach models to understand through massive data exposure. More data, more parameters, more compute yields more capability.
This is the foundation: raw knowledge acquisition at scale.
Scaling Law #2: Post-Training (2022-2024)
RLHF and skill acquisition that enabled ChatGPT.
Transform raw language capability into useful, aligned behavior through human feedback. This enabled commercial AI products — models that could be helpful, harmless, and honest.
Post-training turned capability into usability.
Scaling Law #3: Test-Time (2024+)
The breakthrough Jensen highlighted as “revolutionary.”
OpenAI’s o1 model introduced reasoning at inference — models that think before responding. They use more compute at runtime to solve harder problems. DeepSeek R1 proved this capability can be open-sourced.
This is System 2 thinking: deliberate reasoning, not just pattern matching.
The Curves Keep Climbing
Each scaling law requires enormous compute, and the curves continue climbing simultaneously:
- Pre-training scales knowledge
- Post-training scales alignment
- Test-time scales reasoning
Strategic Implication
Test-time compute is the current frontier. Models that “think longer” perform better — but consume more inference compute. This creates new infrastructure demand beyond training: inference clusters that support extended reasoning chains.
The industry moved from training-dominant to inference-dominant compute requirements.
This is part of a comprehensive analysis. Read the full analysis on The Business Engineer.









