The old scaling law — parameters, data, compute — plateaued by 2023, forcing a fundamental redefinition of how AI performance scales.
Phase 4 introduces two new dimensions: memory and context, shifting the locus of progress from size to coherence.
The strategic race moves from “who can afford the most compute?” to “who can design the most coherent architecture?”

Table of Contents

Why did the traditional scaling formula break down?

Between 2018 and 2024, AI performance scaled in a predictable way:

Performance = f(parameters, data, compute)

This era was simple:
double parameters → consistent capability gains.

Model sizes exploded from 7B to 175B to over a trillion parameters.
But by 2023, the formula began hitting diminishing returns:

compute doubled yet yielded <1% performance gain
training costs grew exponentially
data quality (especially from the open web) became noisy and finite
the largest models started converging in raw ability

The old formula was reaching its ceiling.
A new performance driver was needed.

What does the emerging formula add?

Phase 4 introduces a major evolution:

Performance = f(parameters, data, compute, memory, context)

Two new dimensions — memory and context — become the differentiators that unlock:

emergent agency
multi-session continuity
long-horizon reasoning
stable identity
relationship formation
complex multi-document synthesis

This is the first formula that captures the architecture of persistent intelligence.

What do the five dimensions of scale represent?

1. Parameters — The Old Benchmark

Size still matters, but no longer dominates.
Beyond a certain threshold, doubling parameters gives diminishing marginal returns.

Models do not become meaningfully more coherent just by getting bigger.

2. Data — The Fuel Reservoir

Training corpus quality and diversity remain essential, but the open web is finite and noisy.
Curated datasets and human feedback (RLHF) continue to matter, but the opportunity space is narrowing.

Data alone cannot unlock continuity or agency.

3. Compute — The Power Constraint

Training and inference demand massive GPU hours.
Energy consumption and cooling are becoming structural bottlenecks.

We are compute-bound, not capability-bound.

These first three dimensions define the world before Phase 4.

What new capability comes from the fourth dimension: Memory?

Memory introduces a persistence layer — the first time models can retain information across interactions.

This includes:

cross-session retention
episodic memory (events, interactions)
semantic memory (facts, rules)
learning from interactions

Memory is the bedrock of:

relationship building
task continuity
durable preferences
long-term planning
stable internal world models

Memory converts assistants into autonomous agents.

What new capability comes from the fifth dimension: Context?

Context governs the model’s working memory — the information it can actively reason over at any moment.

New capabilities include:

200K+ token windows
multi-document synthesis
progressive accumulation
long-chain inference
cross-source integration

Context window scale determines:

how much the agent can hold in mind
how deeply it can reason
how long its inference chains can run
how complex a workflow it can manage

Context is the engine of deep thinking.
Memory is the engine of continuity.
Together they power Phase 4.

How does the strategic race change with this new formula?

The Old Race

Who can afford the most compute?
More GPUs, more data, bigger models.
This was a capital-intensive race dominated by hyperscalers.

By Phase 3, it delivered diminishing returns.

The New Race

Who can design coherence?
The winners are those who can architect:

memory structures
long-term continuity
attention across massive context
stable reasoning cycles
cross-session intelligence
identity and preference models
multi-step agentic behavior

This transition mirrors the evolution from CPU-based scaling to full-stack system design in classical computing.

Coherence is the new competitive frontier.

Why does architectural coherence matter more than size?

Because intelligence is not merely the accumulation of parameters.
It is the organization of cognition.

Architectural coherence determines:

how well memory integrates with context
how reasoning carries across time
how stable the agent’s identity is
how well the system avoids drift and hallucinations
how efficiently it filters, stores, and retrieves information
how reliable it is across multi-hour or multi-day sequences

In other words, coherence transforms LLMs from burst-based predictors into persistent, evolving entities.

How do memory and context create emergent intelligence?

Memory enables accumulation.
Context enables reasoning.
Coherence integrates the two.

The result is emergent behavior:

autonomous workflow execution
long-term project management
domain-specific learning
relationship and trust-building
proactive assistance
identity continuity
multi-step, multi-session reasoning

No amount of parameters alone can produce these outcomes.
They arise only when memory and context interact across time.

Final Synthesis

The new scaling formula marks the most important conceptual shift in AI since the Transformer. Performance is no longer defined by sheer size, but by the coherence of memory and context. Parameters, data, and compute still matter — but they are no longer the frontier.

The new frontier is architectural.
Emergent intelligence arises from continuity, integration, and long-horizon coherence — not scale.

Source: https://businessengineer.ai/p/the-four-ai-scaling-phases