
The Old Paradigm vs. The New Paradigm
Old Paradigm: More Compute = Better Models
The traditional approach to AI improvement:
- More GPUs
- More Data
- More Parameters (θn+1)
The problem: Diminishing returns. Pre-training scaling is hitting limits.
New Paradigm: Compute + Signal = Better Models
The emerging approach requires both:
- Compute: Raw processing power
- Signal: Quality training data from RL environments
The Dual Bottleneck
AI progress is now constrained by both compute for training AND quality environments to train on.
Key insight from Epoch AI: “Without diverse, high-quality environments, throwing more compute at RL risks wasting much of it.”
The Numbers That Matter
- ~$2.4K – Compute per RL task (cheap tasks = wasted compute)
- $1B+ – Anthropic discussed for RL environment spending
- $19B – OpenAI R&D compute (2026 projected)
The Strategic Implication
Quality training signals are now as important as raw compute.
The companies that can produce robust, diverse, reward-hack-resistant environments at scale may become as strategically important as chip suppliers.
This is part of a comprehensive analysis. Read the full analysis on The Business Engineer.









