Anthropic was losing the hardest AI benchmark in existence. Then it wasn’t. What the FrontierMath Tier 4 chart tells us about the real shape of the AI capability race.
The Chart That Tells the Whole Story
Epoch AI’s FrontierMath benchmark tracks how well AI models solve research-level mathematics — the kind of problems that take expert mathematicians hours or days. Tier 4 is the hardest: abstract algebraic geometry, computational number theory, problems at the frontier of human mathematical knowledge.
Here’s what the trajectory looks like:
The Capability Trajectory
Mid-2025 — GPT-5
~20%
OpenAI takes the early lead. Anthropic near zero.
Jan 2026 — Opus 4.5
~8%
Anthropic enters the game. Still far behind OpenAI.
Mar 2026 — GPT-5.2
~35%
OpenAI extends dominance. Gap looks insurmountable.
May 2026 — GPT-5.5
~72%
OpenAI doubles again. Massive jump.
June 2026 — Fable 5
~87%
Anthropic leapfrogs OpenAI in one generation. From 28% to 87%.
Why This Matters More Than Any Chatbot Benchmark
FrontierMath isn’t a trivia test. These are 338 unpublished problems spanning abstract algebraic geometry, computational number theory, and research-level proofs. The kind of problems where getting the right answer means the model is doing something that looks a lot like genuine mathematical reasoning — not pattern matching.
A year ago, the best AI models scored near zero. Now Fable 5 is solving 87% of them.
The key insight: Anthropic didn’t win by iterating faster. It won by making a discontinuous jump. From Opus 4.5 (~8%) to the mid-cycle models (~28%) to Fable 5 (87%). That’s not a linear improvement curve — it’s a step function. Something structural changed.
The Structural Read
Three things this chart reveals about the AI race:
1. CAPABILITY JUMPS ARE NON-LINEAR
OpenAI improved steadily: 20% → 35% → 72%. Anthropic was flat, then exploded. In AI, you don’t close the gap gradually — you leapfrog or you don’t. This is the Product Overhang pattern: capability accumulates invisibly until it surfaces all at once.
2. THE MODEL LAYER IS NOT WINNER-TAKE-ALL
Six months ago, it looked like OpenAI had an unassailable lead in reasoning. Now Anthropic has the best math model. The lead in AI changes hands faster than in any technology race in history. No moat lasts more than one model generation.
3. MATH IS THE CANARY FOR EVERYTHING ELSE
Mathematical reasoning is the hardest, most verifiable form of AI cognition. If Fable 5 can solve 87% of research-level math, the downstream implications for code generation, scientific discovery, and autonomous agents are enormous. Math capability is a leading indicator.
Business Engineer Framework
The Product Overhang Doctrine
Why does AI capability appear to jump rather than climb? The Product Overhang Doctrine explains the pattern: capability accumulates beneath the surface until a single release makes it visible. Anthropic’s Fable 5 is a textbook case.
Read the Product Overhang Doctrine →The Bottom Line
Fourteen months ago, no AI could solve a single Tier 4 math problem. Today, Fable 5 solves 87% of them. The race isn’t between companies — it’s between the speed of AI capability growth and our ability to understand what that growth means.
If you’re building strategy around “OpenAI is ahead” or “Anthropic is behind,” this chart is your wake-up call. The lead changes with every model release. The only constant is acceleration.
Source: Epoch AI FrontierMath Benchmark









