The Kaplan Era: When “Just Make It Bigger” Launched the AI Revolution

In 2020, OpenAI published the original scaling laws alongside GPT-3 and established the first quantitative framework for AI capability growth. The thesis was straightforward: performance scales as a power law with model size.

The Five Scaling Phases of AI — Animated Explainer

The Numbers

GPT-3 used 175 billion parameters trained on 300 billion tokens — a ratio of roughly 1.7 tokens per parameter. The assumption: model size mattered more than data volume.

The industry responded accordingly. The race was to build the biggest model possible within a given compute budget.

What It Got Right

The Kaplan paper proved that capability scales predictably with compute. GPT-3’s few-shot abilities genuinely surprised researchers and launched the generative AI wave. Scaling wasn’t a guess anymore — it was an infrastructure blueprint.

The Critical Blind Spot

The scaling law dramatically undervalued data relative to parameters. Models were large but undertrained. It would take DeepMind’s Chinchilla paper two years later to reveal just how wrong the allocation was — GPT-3 should have been trained on 11x more data or been 20x smaller.

The Kaplan era wasn’t just a research finding — it was the largest capital deployment in computing history. Every GPU cluster built, every training run funded, followed this blueprint.

Read the full analysis on The Business Engineer →

Scroll to Top

Discover more from FourWeekMBA

Subscribe now to keep reading and get access to the full archive.

Continue reading

FourWeekMBA