
AI’s Newest Bottleneck and Biggest Opportunity
Reinforcement learning environments have emerged as AI’s newest bottleneck and biggest opportunity.
With Anthropic discussing $1B+ annual spending on RL environments and OpenAI projecting $19B in R&D compute for 2026, a new market layer is crystallizing between raw compute and model capabilities.
The Key Numbers
- $1B+ – Anthropic RL environment spend (discussed annually)
- $19B – OpenAI R&D compute (projected 2026)
- $10B – Mercor valuation (Oct 2025, 5x in 8mo)
- $1.2B – Surge AI revenue (2024, bootstrapped)
The Paradigm Shift
The AI industry has reached an inflection point. After years of pre-training scaling—where progress meant more data, more parameters, more compute—frontier labs are discovering that throwing resources at increasingly massive training runs yields diminishing returns.
The solution? A fundamental shift from pre-training scaling to post-training scaling—specifically, reinforcement learning.
The Dual Bottleneck
This isn’t merely a technical pivot. It represents a restructuring of AI’s economic architecture. Where compute was once the sole constraint, we now face a dual bottleneck:
- Compute for running training
- High-quality environments and tasks to train on
Without diverse, robust training signals, additional compute delivers waste rather than capability.
The Key Insight
As Andrej Karpathy noted: by training LLMs on verifiable tasks across different environments, “the LLMs spontaneously develop strategies that look like ‘reasoning’ to humans.”
Reasoning emerges from structured practice rather than from exposure to raw data.
This is part of a comprehensive analysis. Read the full analysis on The Business Engineer.









