OpenAI just unveiled its first custom AI chip — Jalapeño — built with Broadcom in nine months. Purpose-built for LLM inference, it delivers 50% cost savings vs GPUs. OpenAI is no longer just a model company. It’s building the full stack — from products to models to silicon.
What OpenAI Built
OpenAI designed Jalapeño from the ground up — a custom inference accelerator architected specifically for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products. Broadcom handled manufacturing. The chip went from initial design to tape-out in nine months — what OpenAI calls the fastest ASIC development cycle ever in high-performance semiconductors.
Early testing shows 50% cost savings compared to current GPUs and substantially better performance per watt. First samples are being tested now. Initial deployment is planned for end of 2026, with this being the first chip in a multi-generation compute platform.
The key insight: OpenAI burns $3.7 billion per quarter. Inference is the largest cost. A chip that cuts inference cost by 50% doesn’t just save money — it changes the entire economics of serving 400M+ ChatGPT users. And it’s the answer to the $100B ad business: cheaper inference = lower floor for free-tier users = more ad impressions.
The Full Stack Play
OpenAI explicitly framed this as building the “full stack” — products (ChatGPT, Codex) → models (GPT series) → infrastructure (Jalapeño). This is the same vertical integration playbook running across the industry this week:
SpaceX: Models (xAI) + Compute (Colossus) + Dev Tools (Cursor) + Robotics (Tesla) + Connectivity (Starlink)
Anthropic: Models (Claude) + Memory supply (Micron deal) + Government trust (Glasswing) + Series H capital
Google: Models (Gemini) + Chips (TPUs) + Cloud (GCP) + Distribution (Search/Android) + Content (A24)
OpenAI (now): Models (GPT) + Chips (Jalapeño) + Dev Tools (Codex) + Distribution (ChatGPT 400M users) + Revenue (Ads + Subs)
The Structural Read
NVIDIA’S PRICING POWER JUST GOT CHALLENGED
OpenAI is Nvidia’s largest customer. A custom chip that cuts inference costs 50% is a direct shot at Nvidia’s margin structure. OpenAI won’t stop buying Nvidia GPUs for training — but every dollar of inference that moves to Jalapeño is a dollar Nvidia doesn’t get. This is why Nvidia absorbed Groq’s IP for $20B — to prevent exactly this kind of competition from emerging.
THE IPO NARRATIVE JUST GOT ANOTHER CHAPTER
This week OpenAI revealed: $100B ad target, GPT-5.5-Cyber + Daybreak, and now custom silicon. Each announcement builds the IPO story: not a model company, but a full-stack AI platform with its own chips, its own ad engine, and its own security program. That’s the pitch to public markets.
THE INFERENCE ECONOMICS CHANGE EVERYTHING
50% cheaper inference means: more free-tier users (more ad impressions), cheaper API (more developers), faster agents (more agentic products), and a viable path to profitability. The $3.7B quarterly burn was always an inference cost problem. Jalapeño is the structural answer — not more revenue, but less cost per query.
The Bottom Line
OpenAI just announced it built a chip in nine months that cuts inference costs in half. Nine months. From a company that didn’t have a hardware team two years ago. Jalapeño isn’t a moonshot — it’s a business necessity. When you’re burning $3.7 billion a quarter serving 400 million users, you either cut the cost of serving them or you run out of money. OpenAI chose to build the solution rather than rent it from Nvidia. That’s the most consequential strategic decision the company has made since launching ChatGPT — and it happened in nine months.
Business Engineer
The AI Supercycle — When the Model Layer Builds Its Own Chips
Read the AI Supercycle →








