Technology · Analysis
The Memory Bottleneck Tightens
High-bandwidth memory is now the chokepoint for AI hardware, not chips. HBM shortages are driving DRAM prices up 80%, squeezing consumer electronics, and reshaping the economics of inference.
Stake & Paper Editorial TeamJune 3, 2026
SK Group Chairman Chey Tae-won told Computex 2026 attendees that the global shortage of high-bandwidth memory chips will persist through at least 2030.
Not 2027. Not 2028. A full four years from now. The constraint isn't demand—it's physics.
Producing one unit of HBM3E consumes approximately three times the wafer capacity required to produce the same number of bits in DDR5, according to Micron's earnings disclosures. TrendForce puts the ratio even higher: one gigabyte of HBM requires the equivalent of four gigabytes of standard DRAM in wafer area.
Every wafer diverted to AI accelerators is a wafer that can't produce memory for laptops, phones, or enterprise servers.
DRAM prices have risen 80 to 90 percent so far this quarter, according to Counterpoint Research.
The shortage is structural, not cyclical. And it's rewriting the competitive map for AI hardware.
Can You Build an AI Chip Without the Memory to Feed It?
Modern AI GPUs cannot function at peak capacity without integrated HBM; the shortage of memory effectively caps the number of GPUs that can be shipped, regardless of chip availability.
That's the bind. Nvidia's Blackwell architecture,
priced at $30,000 to $40,000 per GPU,
delivers up to five times the performance of the H100.
But TSMC's primary 4NP allocation and CoWoS packaging capacity—used for HBM3e integration—limited Blackwell production through most of 2025.
TSMC's most advanced packaging method, Chip on Wafer on Substrate (CoWoS), is increasing at a stunning 80% compound annual growth rate, according to TSMC North America packaging solutions head Paul Rousseau.
TSMC has been expanding CoWoS capacity from roughly 35,000 wafer starts per month in late 2024 toward a projected 120,000 to 130,000 per month by the end of 2026—a roughly fourfold increase in under two years—and demand still outpaces it.
Nvidia has reserved a majority of the most advanced capacity available at TSMC, which is the volume leader in packaging.
The memory wall is now the real wall.
SK hynix, the largest supplier of HBM to Nvidia, has told investors that its advanced packaging lines are at capacity through 2026. Micron, which supplies HBM3E to Nvidia and other U.S. clients, is in a similar position. Samsung has HBM capacity reserved through its Foundry and Memory business lines for tier-1 cloud clients.
Who Wins When Nvidia Can't Supply Everyone?
AMD is gaining ground.
Nvidia holds an estimated 80-85 percent of data center AI accelerator market by revenue in 2026, down from approximately 92 percent in 2023. AMD market share rose to approximately 5-7 percent on the strength of MI300X and MI325X inference adoption; Microsoft and Meta are the largest deployers.
KeyBanc analyst John Vinh forecasts $14 billion to $15 billion in AI revenue for AMD in 2026.
For inference-heavy workloads—particularly memory-bound LLM serving—MI300X achieves competitive or superior cost per token at most batch sizes.
MI300X cloud pricing runs roughly 40–60% below H100 at comparable providers.
The economics matter.
Training AI models is a cost center, while inference is a "profit center" that directly generates revenue, according to Karl Freund, founder and principal analyst at Cambrian AI Research.
Intel is carving a niche with Gaudi 3.
The Dell AI platform features Intel Gaudi 3, which offers 70% better price-performance inference throughput of Llama 3 80B over Nvidia H100, according to Intel.
Intel is carving a niche in enterprise AI, where cost-efficiency, power consumption, and an open ecosystem are critical.
Inflection AI's decision showcases Intel Gaudi 3 as a viable alternative to Nvidia, which is largely sold out for the next several years, according to Hyoun Park, CEO and chief analyst at Amalgam Insights.
But the hyperscalers are hedging differently.
AWS, Google, Microsoft, and Meta all invest in custom silicon—Trainium, TPU, Maia, MTIA—primarily for cost arbitrage on internal workloads, less so for external customer-facing products.
Nvidia CEO Jensen Huang said inference already accounts for more than 40% of AI-related revenue—and predicted that it is "about to go up by a billion times."
What Changed This Week
At its Build developer conference in San Francisco, Microsoft announced MAI-Code-1-Flash, its inaugural model in the AI coding space, alongside six other proprietary models.
The move signals Microsoft's push for "long term self-sufficiency," reducing reliance on OpenAI and Anthropic despite investing $13 billion and $5 billion in them, respectively.
The capital expenditure of the 14 largest publicly owned data center operators globally is seen close to $750 billion this year against a little less than $450 billion last year, according to BloombergNEF.
The money is flooding in. The memory isn't keeping pace.
What to Watch
Micron's $9.6 billion Hiroshima HBM facility, announced in partnership with the Japanese government, is expected to begin construction around May 2026, with its first output expected in 2028.
HBM4, expected to enter mass production in 2026, will reach a total bandwidth of 2TB/s, doubling the interface width to 2048-bit, according to TrendForce.
Bernstein estimates TSMC's CoWoS capacity will reach 125,000 wafers per month by end of 2026, but hyperscaler demand continues to outpace that ramp.
Track HBM pricing, DRAM spot rates, and any announcements of multi-year supply agreements from hyperscalers. If cloud giants signal a pause in AI buildouts, the shortage eases. If they double down, the squeeze tightens further.