Apple’s own internal testing proved what the market suspected: its AI models are years behind the competition.
What Apple Did
- Held internal head-to-head competition
- Tested models on complex user queries
- Compared performance head-to-head
- Google started training custom models for Apple’s servers
The Contenders
| Model | Provider |
|---|---|
| ChatGPT | OpenAI |
| Claude | Anthropic |
| Gemini | |
| Apple Models | In-House |
Performance Comparison
| Model | Performance | Result |
|---|---|---|
| Claude | ████████████████ | WINNER |
| ChatGPT | ███████████████ | Strong |
| Gemini | ██████████████ | Strong |
| Apple | ██████████ | LOST |
The Damning Verdict
“Internal evaluations indicated that third-party models — particularly Anthropic’s Claude — outperformed Apple’s tools”
John Gruber’s analysis: “…he would be right” — Apple IS struggling in AI.
What This Exposed
- R&D Failure — $34.5B/year wasted
- Talent Gap — Missing AI expertise
- Strategy Failure — Late to the game
- Culture Problem — Secrecy backfired
This is part of a comprehensive analysis. Read the full analysis on The Business Engineer.









