The Kaplan Era: When "Just Make It Bigger" Launched the AI Revolution
In 2020, OpenAI published the original scaling laws alongside GPT-3 and established the first quantitative framework for AI capability growth. The thesis was straightforward: performance scales as a power law with model size .
Key Components
The Numbers
GPT-3 used 175 billion parameters trained on 300 billion tokens — a ratio of roughly 1.7 tokens per parameter. The assumption: model size mattered more than data volume.
What It Got Right
The Kaplan paper proved that capability scales predictably with compute. GPT-3's few-shot abilities genuinely surprised researchers and launched the generative AI wave.
The Critical Blind Spot
The scaling law dramatically undervalued data relative to parameters . Models were large but undertrained.
Real-World Examples
OpenaiDeepmind
Key Insight
The scaling law dramatically undervalued data relative to parameters . Models were large but undertrained. It would take DeepMind's Chinchilla paper two years later to reveal just how wrong the allocation was — GPT-3 should have been trained on 11x more data or been 20x smaller .
Exec Package + Claude OS Master Skill | Business Engineer Founding Plan
FourWeekMBA x Business Engineer | Updated 2026
In 2020, OpenAI published the original scaling laws alongside GPT-3 and established the first quantitative framework for AI capability growth. The thesis was straightforward: performance scales as a power law with model size.
The Five Scaling Phases of AI — Animated Explainer
The Numbers
GPT-3 used 175 billion parameters trained on 300 billion tokens — a ratio of roughly 1.7 tokens per parameter. The assumption: model size mattered more than data volume.
The industry responded accordingly. The race was to build the biggest model possible within a given compute budget.
What It Got Right
The Kaplan paper proved that capability scales predictably with compute. GPT-3’s few-shot abilities genuinely surprised researchers and launched the generative AI wave. Scaling wasn’t a guess anymore — it was an infrastructure blueprint.
The Critical Blind Spot
The scaling law dramatically undervalued data relative to parameters. Models were large but undertrained. It would take DeepMind’s Chinchilla paper two years later to reveal just how wrong the allocation was — GPT-3 should have been trained on 11x more data or been 20x smaller.
The Kaplan era wasn’t just a research finding — it was the largest capital deployment in computing history. Every GPU cluster built, every training run funded, followed this blueprint.
What is The Kaplan Era: When "Just Make It Bigger" Launched the AI Revolution?
In 2020, OpenAI published the original scaling laws alongside GPT-3 and established the first quantitative framework for AI capability growth. The thesis was straightforward: performance scales as a power law with model size .
What is What It Got Right?
The Kaplan paper proved that capability scales predictably with compute. GPT-3's few-shot abilities genuinely surprised researchers and launched the generative AI wave. Scaling wasn't a guess anymore — it was an infrastructure blueprint .
What is the critical blind spot?
The scaling law dramatically undervalued data relative to parameters . Models were large but undertrained. It would take DeepMind's Chinchilla paper two years later to reveal just how wrong the allocation was — GPT-3 should have been trained on 11x more data or been 20x smaller .
Gennaro is the creator of FourWeekMBA, which reached about four million business people, comprising C-level executives, investors, analysts, product managers, and aspiring digital entrepreneurs in 2022 alone | He is also Director of Sales for a high-tech scaleup in the AI Industry | In 2012, Gennaro earned an International MBA with emphasis on Corporate Finance and Business Strategy.
Scroll to Top
Discover more from FourWeekMBA
Subscribe now to keep reading and get access to the full archive.