The AI race has a new bottleneck, and it’s not what most observers expect. While attention remains fixed on GPU shortages and frontier model capabilities, the binding constraint on AI scaling has quietly shifted to High Bandwidth Memory (HBM).

This is the critical interface layer where compute meets data, and its physics-driven limitations now determine the ceiling on AI capability scaling.

Micron’s announcement of a ¥1.5 trillion ($9.6 billion) investment to build a new HBM production plant in Hiroshima, backed by up to ¥500 billion in Japanese government subsidies, signals the strategic importance of this layer. Construction begins in May 2026 with shipments starting in 2028.

This move positions Japan as a second pillar alongside Korea in the HBM supply chain and represents a fundamental shift in how nations and corporations are deploying capital in the AI infrastructure stack.

Memory is becoming the new compute. HBM growth is outpacing accelerator demand, making memory the rate-limiting factor in the AI scaling equation.

Get The Business Engineering Thinking OS

You can also get it by joining our BE Thinking OS Coaching Program.

Get The AI Bundle Now!

Table of Contents

Toggle

The Physics Problem: Understanding the Memory Wall

To understand why HBM matters, you must first understand the memory wall—a fundamental physics problem that has shaped computer architecture for decades.

The Divergence Problem

Over the past two decades, compute performance has improved by roughly 10,000× while memory bandwidth has improved by only about 10×. This thousand-fold divergence creates a structural bottleneck: processors can execute operations far faster than memory can feed them data. The processor sits idle, waiting for information to arrive.

Traditional DRAM connects to processors through a relatively narrow interface—like trying to fill a swimming pool through a garden hose. No matter how powerful your pump (compute), the hose (memory bandwidth) determines the actual flow rate.

Why AI Makes This Worse

Transformer architectures, which power essentially all frontier AI models, are memory-bound by design. The attention mechanism requires accessing the entire context window for every token generated. A model like GPT-4, with potentially 1.8 trillion parameters, must read hundreds of gigabytes of weights from memory for each inference pass.

The math is stark: modern transformers require roughly 100× more memory bandwidth than compute FLOPS. You can add all the tensor cores you want, but if you can’t feed them data fast enough, they sit idle. This is why NVIDIA’s H100 and H200 GPUs dedicate more than half their die area and cost to memory and memory interfaces.

The HBM Solution: 3D Stacking Changes the Equation

High Bandwidth Memory solves the memory wall through a simple but profound architectural innovation: instead of placing memory far from the processor and connecting through narrow buses, HBM stacks memory dies vertically and places them directly adjacent to the processor on the same package.

The Technical Architecture

HBM consists of multiple DRAM dies stacked vertically, connected by thousands of tiny vertical pathways called Through-Silicon Vias (TSVs). Where traditional DRAM might have 64 data lines connecting to a processor, HBM3e can have 1,024 or more.

This massive parallelism, combined with the short physical distance, delivers bandwidth improvements of 10× to 100× over traditional memory.

The latest HBM3e generation delivers peak bandwidth of 8TB/s—enough to read the entire contents of a large model in fractions of a second. NVIDIA’s H200 GPU packages 192GB of HBM3e memory, enabling inference on larger models without the complexity and latency of multi-GPU configurations.

The Cost Reality

This performance comes at a price. HBM costs roughly $30-50 per gigabyte compared to $3-5 for standard DRAM—a 10× premium. For an H100 GPU with 80GB of HBM, the memory alone costs $2,400-4,000. This means

HBM represents 50-60% of the total cost of an AI accelerator. When you buy a $30,000 GPU, more than half your money is going to memory.

Where HBM Sits in the AI Ecosystem

Understanding HBM’s strategic importance requires seeing where it fits in the broader AI infrastructure stack. Think of the AI ecosystem as an hourglass: everything flows through a narrow constraint layer.

The Eight-Layer Stack

From top to bottom, the AI infrastructure stack consists of:

AI Applications — ChatGPT, Claude, Gemini, Copilot, Midjourney and thousands of enterprise deployments
Foundation Models — GPT-4, Claude 3, Gemini, Llama, Mistral and the model architectures that power applications
Training & Inference — The distributed compute systems, gradient descent, and attention mechanisms that create and run models
AI Accelerators — NVIDIA H100/B100, AMD MI300X, Google TPU, and custom ASICs from hyperscalers
High Bandwidth Memory — THE BOTTLENECK LAYER — where compute meets data
Advanced Packaging — CoWoS, interposers, 2.5D/3D integration that bonds HBM to processors
Silicon Fabrication — TSMC, Samsung Foundry, and the 3nm/5nm nodes that build the chips
Physical Infrastructure — Data centers, power, cooling, and networking

The Hourglass Architecture

HBM sits at Layer 3—the narrow neck of the hourglass. Everything above it (applications, models, compute workloads) creates demand for memory bandwidth. Everything below it (packaging, fabrication, infrastructure) supports its production. All AI capability must flow through this constraint layer.

This architectural position explains why HBM has become the binding constraint on AI scaling. You can build more data centers, deploy more power, fabricate more silicon—but if you can’t produce enough HBM, the entire stack above it cannot expand.

The Bottleneck Shift: From Compute to Memory

The AI infrastructure narrative has evolved through distinct phases, each defined by a different constraint:

Phase 1: The GPU Shortage (2020-2023)

During the initial AI scaling era, GPUs were the binding constraint. Organizations couldn’t buy H100s at any price. NVIDIA’s production capacity, limited by TSMC’s advanced packaging capacity and overall demand, created a hard ceiling on AI deployment. The market narrative centered on ‘who has the GPUs.’

Phase 2: The Memory Shortage (2024-2028+)

As GPU production has ramped and NVIDIA has expanded its supply chain, a new constraint has emerged. Organizations now face a different problem: ‘We have GPUs, but not enough memory.’ HBM production capacity has not kept pace with accelerator demand.

The numbers tell the story: HBM demand is projected to grow at 58% CAGR through 2030, outpacing GPU demand growth. Every new GPU generation increases HBM requirements—the H100 needs 80GB, the H200 needs 192GB, and next-generation accelerators will require even more.

The scaling equation has inverted: AI capability now scales with memory bandwidth, not just compute FLOPS. Every 2× increase in model size requires roughly 2× increase in HBM capacity.

The HBM Oligopoly: Three Companies Control the AI Race

The HBM market exhibits extreme concentration. Only three companies in the world can manufacture HBM at scale:

Market Share Distribution (2024)

SK Hynix: ~53% — The market leader with the closest relationship to NVIDIA and first-mover advantage in HBM3e
Samsung: ~35% — Strong in volume but has faced yield challenges in latest-generation HBM
Micron: ~12% — The smallest player but growing fastest, with ambitions to reach 20%+ share

This concentration creates profound strategic implications. With 88% of HBM production concentrated in South Korea and the remainder split between the US and Japan, the AI supply chain carries significant geopolitical risk.

Why New Entrants Can’t Emerge

HBM manufacturing requires a rare combination of capabilities: advanced DRAM fabrication expertise, 3D stacking technology, sophisticated packaging know-how, and billions in capital investment. The barriers to entry are essentially insurmountable in any relevant timeframe. Chinese memory makers like YMTC and CXMT are years behind in basic DRAM technology and face export controls on the advanced equipment needed for HBM production.

Micron’s Strategic Move: The Hiroshima Play

Micron’s ¥1.5 trillion investment in Hiroshima represents a calculated strategic repositioning with implications across multiple dimensions.

The Five Strategic Levers

Capacity Expansion: The new facility directly addresses global HBM shortages that constrain AI compute scaling. Micron gains production volume to serve customers it currently cannot supply.
Subsidy Economics: Japan’s METI is providing up to ¥500 billion in subsidies—covering roughly one-third of the investment. This dramatically reduces Micron’s capital intensity and de-risks the long-term production ramp.
Supply Chain Diversification: The investment aligns with Japan-US industrial policy objectives to reduce dependence on Korea and Taiwan. For customers seeking geographic risk mitigation, Japan-sourced HBM becomes a compelling option.
AI Infrastructure Tailwinds: With HBM demand growing faster than accelerator demand, memory producers face a sellers’ market. Capacity investments made now will generate returns for the next decade.
Oligopoly Positioning: The move positions Micron to close the technology and market share gap with SK Hynix and Samsung. In a three-player market, moving from 12% to 20%+ share fundamentally changes competitive dynamics.

The Japan Angle

Japan’s aggressive subsidization reflects its broader semiconductor strategy. The country has committed massive resources to rebuilding domestic chip capabilities, including support for Rapidus (advanced logic), TSMC’s Arizona-related Japanese facility, and now Micron’s HBM expansion. Japan is positioning itself as the second pillar of the non-China semiconductor supply chain, with particular strength in materials, equipment, and now memory.

Geopolitical Context: Industrial Policy Alignment

HBM has become a focal point of great power competition in semiconductors. The current landscape reflects distinct national strategies:

United States: The CHIPS Act Framework

The US has committed $52 billion in semiconductor subsidies, primarily focused on advanced logic fabrication. Memory has received less direct support, but Micron’s domestic facilities benefit from the broader policy framework. The strategic objective is reducing dependence on Asian production for critical components.

Japan: The Re-industrialization Push

Japan’s METI has become one of the world’s most aggressive semiconductor investors, committing tens of billions of yen across multiple projects. The Micron HBM investment fits a pattern: Japan is leveraging its remaining strengths (materials, equipment, precision manufacturing) to rebuild semiconductor production capacity lost over the past two decades.

South Korea: Defending the Incumbent Position

SK Hynix and Samsung hold dominant positions in HBM. Korea’s strategic priority is maintaining this advantage while fending off challenges from both US-Japan allied efforts and potential future Chinese competitors. The Korean government has implemented its own support programs to ensure continued investment in advanced memory.

China: Locked Out

Export controls have effectively blocked China’s access to HBM technology and the equipment needed to produce it. Chinese AI development faces a structural disadvantage: domestic accelerators cannot access the memory bandwidth available to Western competitors. This creates a ceiling on Chinese AI capabilities that cannot be overcome through software innovation alone.

The Demand Flywheel: Who Needs HBM

HBM demand is driven by the full spectrum of AI accelerator vendors, creating a multi-source demand flywheel:

The Major Consumers

NVIDIA — The dominant consumer, with H100 (80GB), H200 (192GB), and B100 (192GB+) all requiring massive HBM allocations. NVIDIA’s product roadmap drives the majority of HBM demand.
AMD — The MI300X packs 192GB of HBM, positioning AMD as a credible alternative for memory-intensive workloads. AMD’s growth translates directly to HBM demand growth.
Google TPU — Google’s custom TPU accelerators are HBM-intensive, serving both internal AI workloads and Google Cloud customers.
Custom Silicon — AWS Trainium, Meta’s MTIA, and Microsoft’s Maia all incorporate HBM. As hyperscalers pursue custom silicon strategies, they add incremental HBM demand.

The market is projected to grow at a 58% compound annual growth rate through 2030. No other semiconductor segment approaches this growth rate, reflecting the structural importance of memory in AI scaling.

Value Chain Analysis: Where Value Accrues

The AI memory value chain reveals where economic value is captured:

DRAM Makers (Micron, SK Hynix, Samsung): Capture value through HBM production margins, which significantly exceed standard DRAM margins due to technical complexity and supply constraints.
HBM Packaging: The 3D stacking and CoWoS integration represent high-value manufacturing steps. TSMC’s advanced packaging capacity has become a strategic asset.
GPU Integration (NVIDIA, AMD): Accelerator vendors capture value by integrating HBM with their processors, though they depend on memory suppliers.
Hyperscalers (Microsoft, Google, Amazon): Cloud providers deploy HBM-equipped systems to serve enterprise AI demand, capturing value through cloud service margins.
AI Applications (OpenAI, Anthropic, etc.): Model developers depend on the entire stack below them. Their ability to scale is constrained by infrastructure availability.

The bottleneck position confers pricing power. HBM makers can capture value precisely because they control the constraint on scaling.

What Comes Next: Evolution of the Constraint

The HBM constraint will evolve but not disappear:

Technology Roadmap

HBM4 is expected in 2026, offering further bandwidth improvements. However, each generation requires more complex manufacturing, potentially tightening supply constraints even as per-unit capacity increases. The fundamental physics problem—memory bandwidth scaling slower than compute—remains unsolved.

Alternative Architectures

Research continues on near-memory and in-memory computing architectures that could reduce bandwidth requirements. However, these remain years from production deployment. For the foreseeable future, HBM is the only viable solution for high-performance AI.

Capacity Expansion

All three HBM makers are expanding capacity, but production ramps take years. Micron’s Hiroshima facility won’t ship until 2028. SK Hynix and Samsung are similarly expanding, but new capacity consistently lags demand growth. The shortage is structural, not cyclical.

The Bottom Line

HBM is becoming the choke point of the AI race. Micron’s $9.6 billion Hiroshima investment positions Japan as a second pillar alongside Korea in the global HBM supply chain. With government backing and exploding demand from every major AI accelerator vendor, Micron is racing to close the technology gap and secure long-term share in the most strategic component of AI compute.

The deeper insight is structural: AI capability now scales with memory bandwidth, not just compute FLOPS. Understanding this constraint—and its implications for architecture, investment, and strategy—is essential for anyone navigating the AI infrastructure landscape.

Memory is the new compute. The organizations and nations that control HBM production control the ceiling on AI capability scaling.

Recap: In This Issue!

The Constraint Shift: From Compute to Memory

AI’s binding constraint has moved from GPUs to High Bandwidth Memory (HBM).
Transformer architectures are inherently memory-bound, requiring ~100× more memory bandwidth than FLOPS.
GPU supply has increased, but HBM capacity has not — creating a new structural ceiling for AI scaling.

Why HBM Matters: The Physics Problem

Compute has improved ~10,000× over 20 years; memory bandwidth only ~10×.
This divergence creates the “memory wall” where compute idles waiting for data.
HBM solves part of this through vertical stacking (TSVs) and ultra-wide interfaces, delivering 10×–100× bandwidth gains.
HBM now accounts for 50–60 percent of accelerator cost — memory is the asset, compute is the wrapper.

The Hourglass Stack: HBM as the Choke Point

In the eight-layer AI infrastructure stack, HBM sits at the narrow neck where all AI demand above and all fabrication/packaging below must pass.
Scaling models, applications, and data centers is now gated by HBM supply, not silicon.
Every new GPU generation increases HBM requirements: H100 (80GB) → H200/B100 (192GB+) → next-gen even higher.

The Oligopoly: Three Firms Control Global AI Scaling

The global HBM market is controlled by SK Hynix (~53 percent), Samsung (~35 percent), and Micron (~12 percent).
Barriers to entry (DRAM expertise, TSV stacking, advanced packaging, capital intensity) make new entrants effectively impossible.
88 percent of supply sits in Korea, creating geopolitical concentration risk.

Micron’s Hiroshima Gambit: A Strategic Repositioning

¥1.5 trillion investment, with up to ¥500 billion in Japanese subsidies, reduces capital intensity and long-term risk.
Positions Japan as the second global HBM pillar alongside Korea.
Enables Micron to chase 20 percent+ market share and challenge SK Hynix’s dominance.
Aligns with Japan-US industrial policy to diversify memory supply chains.

Global Industrial Policy Realignment

US: CHIPS Act focused on logic; HBM indirectly benefits but lacks targeted subsidy.
Japan: Aggressive re-industrialization strategy leveraging materials, equipment, and subsidy muscle to rebuild semiconductor relevance.
Korea: Defending incumbent advantage; heavily supporting SK Hynix and Samsung to maintain the lead.
China: Blocked — export controls limit access to advanced DRAM tools, imposing a structural ceiling on domestic AI accelerators.

The Demand Flywheel: Every AI Vendor Needs HBM

NVIDIA (H100, H200, B100) is the dominant consumer; its roadmap defines HBM demand curves.
AMD MI300X, Google TPU, AWS Trainium, Meta MTIA, and Microsoft Maia add incremental multi-sided demand.
Market projected to grow at a 58 percent CAGR through 2030 — the steepest in semiconductors.

Value Chain Concentration

Value accrues at the bottleneck: HBM producers and advanced packagers (TSMC CoWoS).
GPU makers integrate but are dependent on memory vendors.
Hyperscalers monetize HBM indirectly through AI cloud margins.
Model developers operate entirely downstream of memory supply constraints.

What Happens Next

HBM4 (2026) improves density and bandwidth but increases manufacturing complexity — tightening supply even as performance rises.
Alternative architectures (near-memory, in-memory compute) remain research-stage.
Supply expansions (Micron Hiroshima, SK Hynix, Samsung) have multiyear ramps; shortages are structural, not cyclical.

The Bottom Line

Memory bandwidth, not compute, now determines the pace of AI progress.
HBM is the constraining variable in the AI scaling equation.
Nations that secure HBM capacity — and companies that control it — effectively define the ceiling of global AI capability.

With massive ♥️ Gennaro Cuofano, The Business Engineer

Read the full analysis on The Business Engineer.

margin: 36px 0; border-radius: 0 8px 8px 0; font-family: Inter, system-ui, sans-serif;">

margin: 0 0 8px; font-weight: 700;">BIA INSIGHT

margin: 0 0 12px;">Memory Infrastructure as the Hidden Layer of AI Competitive Advantage

margin: 0 0 16px;">Through the BIA lens, the AI memory chokepoint illustrates a powerful case of infrastructure-layer value capture. The mental model of value chain disaggregation reveals that while the industry focuses on model capabilities, the true bottleneck has shifted to memory bandwidth and capacity — creating what amounts to a hardware-defined ceiling on AI scaling. Layer 3 moat classification shows this is fundamentally about capital-intensive barriers to entry: only companies that can vertically integrate across the memory-compute stack will capture the structural economics of next-generation AI.

Run this analysis yourself with The Business Engineer Skill →

margin: 48px 0 0; text-align: center; font-family: Inter, system-ui, sans-serif;">

margin: 0 0 8px;">THE BUSINESS ENGINEER

margin: 0 0 12px;">Analyze Any Company Like This in 30 Seconds

margin: 0 0 20px; max-width: 500px; display: inline-block;">110 mental models. 5-layer analytical engine. Visual-first outputs. One skill file for Claude.

Get The Business Engineer Skill →

The Physics Problem: Understanding the Memory Wall

The Divergence Problem

Why AI Makes This Worse

The HBM Solution: 3D Stacking Changes the Equation

The Technical Architecture

The Cost Reality

Where HBM Sits in the AI Ecosystem

The Eight-Layer Stack

The Hourglass Architecture

The Bottleneck Shift: From Compute to Memory

Phase 1: The GPU Shortage (2020-2023)

Phase 2: The Memory Shortage (2024-2028+)

The HBM Oligopoly: Three Companies Control the AI Race

Market Share Distribution (2024)

Why New Entrants Can’t Emerge

Micron’s Strategic Move: The Hiroshima Play

The Five Strategic Levers

The Japan Angle

Geopolitical Context: Industrial Policy Alignment

United States: The CHIPS Act Framework

Japan: The Re-industrialization Push

South Korea: Defending the Incumbent Position

China: Locked Out

The Demand Flywheel: Who Needs HBM

The Major Consumers

Value Chain Analysis: Where Value Accrues

What Comes Next: Evolution of the Constraint

Technology Roadmap

Alternative Architectures

Capacity Expansion

The Bottom Line

Recap: In This Issue!

The Constraint Shift: From Compute to Memory

Why HBM Matters: The Physics Problem

The Hourglass Stack: HBM as the Choke Point

The Oligopoly: Three Firms Control Global AI Scaling

Micron’s Hiroshima Gambit: A Strategic Repositioning

Global Industrial Policy Realignment

The Demand Flywheel: Every AI Vendor Needs HBM

Value Chain Concentration

What Happens Next

The Bottom Line

Related

More Resources

About The Author

Gennaro Cuofano

Discover more from FourWeekMBA