
The Trajectory of RL Environment Evolution
The field has evolved through three distinct phases, each requiring different capabilities and creating different market opportunities.
Phase 1: Math & Code
Status: Mature, declining emphasis
Characteristics:
- Verifiable answers without complex environments
- Easy to produce at scale
- Limited transfer to other capabilities
Math tasks are easy to create but may be declining in emphasis—they don’t transfer as well to other capabilities. Coding remains a major focus, evolving beyond SWE-bench-style tasks toward productionized workflows.
Phase 2: Enterprise Workflows
Status: Major growth area (2025-2026)
Characteristics:
- Valuable and quantifiable
- Enterprise software complexity
- High ROI for training investment
Examples: Filing expense reports, creating pivot tables, generating slides, navigating CRMs.
“Labs index heavily on what’s valuable and quantifiable, and enterprise workflows are perfect for that.”
Strategic implication: Enterprise = golden opportunity for environment creators.
Phase 3: Long-Horizon Tasks
Status: Emerging frontier
Characteristics:
- Multi-step, multi-goal objectives
- Multiple tabs and browser contexts
- Extended task completion timeframes
Emerging capabilities:
- Multi-turn user interactions
- Environments optimizing for multiple goals
- Tooling for researchers to inspect and modify trajectories
Strategic implication: Long-horizon = future moat for early movers in autonomous agent training.
Market Implications
| Phase | Status | Opportunity |
|---|---|---|
| Math & Code | Commoditizing | Limited |
| Enterprise | Golden Opportunity | High Growth |
| Long-Horizon | Future Moat | Early Mover Advantage |
This is part of a comprehensive analysis. Read the full analysis on The Business Engineer.









