
- The old scaling law — parameters, data, compute — plateaued by 2023, forcing a fundamental redefinition of how AI performance scales.
- Phase 4 introduces two new dimensions: memory and context, shifting the locus of progress from size to coherence.
- The strategic race moves from “who can afford the most compute?” to “who can design the most coherent architecture?”
Why did the traditional scaling formula break down?
Between 2018 and 2024, AI performance scaled in a predictable way:
Performance = f(parameters, data, compute)
This era was simple:
double parameters → consistent capability gains.
Model sizes exploded from 7B to 175B to over a trillion parameters.
But by 2023, the formula began hitting diminishing returns:
- compute doubled yet yielded <1% performance gain
- training costs grew exponentially
- data quality (especially from the open web) became noisy and finite
- the largest models started converging in raw ability
The old formula was reaching its ceiling.
A new performance driver was needed.
What does the emerging formula add?
Phase 4 introduces a major evolution:
Performance = f(parameters, data, compute, memory, context)
Two new dimensions — memory and context — become the differentiators that unlock:
- emergent agency
- multi-session continuity
- long-horizon reasoning
- stable identity
- relationship formation
- complex multi-document synthesis
This is the first formula that captures the architecture of persistent intelligence.
What do the five dimensions of scale represent?
1. Parameters — The Old Benchmark
Size still matters, but no longer dominates.
Beyond a certain threshold, doubling parameters gives diminishing marginal returns.
Models do not become meaningfully more coherent just by getting bigger.
2. Data — The Fuel Reservoir
Training corpus quality and diversity remain essential, but the open web is finite and noisy.
Curated datasets and human feedback (RLHF) continue to matter, but the opportunity space is narrowing.
Data alone cannot unlock continuity or agency.
3. Compute — The Power Constraint
Training and inference demand massive GPU hours.
Energy consumption and cooling are becoming structural bottlenecks.
We are compute-bound, not capability-bound.
These first three dimensions define the world before Phase 4.
What new capability comes from the fourth dimension: Memory?
Memory introduces a persistence layer — the first time models can retain information across interactions.
This includes:
- cross-session retention
- episodic memory (events, interactions)
- semantic memory (facts, rules)
- learning from interactions
Memory is the bedrock of:
- relationship building
- task continuity
- durable preferences
- long-term planning
- stable internal world models
Memory converts assistants into autonomous agents.
What new capability comes from the fifth dimension: Context?
Context governs the model’s working memory — the information it can actively reason over at any moment.
New capabilities include:
- 200K+ token windows
- multi-document synthesis
- progressive accumulation
- long-chain inference
- cross-source integration
Context window scale determines:
- how much the agent can hold in mind
- how deeply it can reason
- how long its inference chains can run
- how complex a workflow it can manage
Context is the engine of deep thinking.
Memory is the engine of continuity.
Together they power Phase 4.
How does the strategic race change with this new formula?
The Old Race
Who can afford the most compute?
More GPUs, more data, bigger models.
This was a capital-intensive race dominated by hyperscalers.
By Phase 3, it delivered diminishing returns.
The New Race
Who can design coherence?
The winners are those who can architect:
- memory structures
- long-term continuity
- attention across massive context
- stable reasoning cycles
- cross-session intelligence
- identity and preference models
- multi-step agentic behavior
This transition mirrors the evolution from CPU-based scaling to full-stack system design in classical computing.
Coherence is the new competitive frontier.
Why does architectural coherence matter more than size?
Because intelligence is not merely the accumulation of parameters.
It is the organization of cognition.
Architectural coherence determines:
- how well memory integrates with context
- how reasoning carries across time
- how stable the agent’s identity is
- how well the system avoids drift and hallucinations
- how efficiently it filters, stores, and retrieves information
- how reliable it is across multi-hour or multi-day sequences
In other words, coherence transforms LLMs from burst-based predictors into persistent, evolving entities.
How do memory and context create emergent intelligence?
Memory enables accumulation.
Context enables reasoning.
Coherence integrates the two.
The result is emergent behavior:
- autonomous workflow execution
- long-term project management
- domain-specific learning
- relationship and trust-building
- proactive assistance
- identity continuity
- multi-step, multi-session reasoning
No amount of parameters alone can produce these outcomes.
They arise only when memory and context interact across time.
Final Synthesis
The new scaling formula marks the most important conceptual shift in AI since the Transformer. Performance is no longer defined by sheer size, but by the coherence of memory and context. Parameters, data, and compute still matter — but they are no longer the frontier.
The new frontier is architectural.
Emergent intelligence arises from continuity, integration, and long-horizon coherence — not scale.
Source: https://businessengineer.ai/p/the-four-ai-scaling-phases








