Domain Evolution: The Three Phases of RL Environments

Domain Evolution: The Three Phases of RL Environments

The Trajectory of RL Environment Evolution

The field has evolved through three distinct phases, each requiring different capabilities and creating different market opportunities.

Phase 1: Math & Code

Status: Mature, declining emphasis

Characteristics:

  • Verifiable answers without complex environments
  • Easy to produce at scale
  • Limited transfer to other capabilities

Math tasks are easy to create but may be declining in emphasis—they don’t transfer as well to other capabilities. Coding remains a major focus, evolving beyond SWE-bench-style tasks toward productionized workflows.

Phase 2: Enterprise Workflows

Status: Major growth area (2025-2026)

Characteristics:

  • Valuable and quantifiable
  • Enterprise software complexity
  • High ROI for training investment

Examples: Filing expense reports, creating pivot tables, generating slides, navigating CRMs.

“Labs index heavily on what’s valuable and quantifiable, and enterprise workflows are perfect for that.”

Strategic implication: Enterprise = golden opportunity for environment creators.

Phase 3: Long-Horizon Tasks

Status: Emerging frontier

Characteristics:

  • Multi-step, multi-goal objectives
  • Multiple tabs and browser contexts
  • Extended task completion timeframes

Emerging capabilities:

  • Multi-turn user interactions
  • Environments optimizing for multiple goals
  • Tooling for researchers to inspect and modify trajectories

Strategic implication: Long-horizon = future moat for early movers in autonomous agent training.

Market Implications

Phase Status Opportunity
Math & Code Commoditizing Limited
Enterprise Golden Opportunity High Growth
Long-Horizon Future Moat Early Mover Advantage

This is part of a comprehensive analysis. Read the full analysis on The Business Engineer.

Scroll to Top

Discover more from FourWeekMBA

Subscribe now to keep reading and get access to the full archive.

Continue reading

FourWeekMBA