AI Great Powers Part 6: Eight Stacks Per Chip, the Arithmetic That Caps China’s AI Hardware
Not wafers, not packaging, not tools: China’s AI output reduces to how many good HBM stacks it can build, divided by eight; here is the full math and every receipt.
I’m a sabbatical-at-home dad this summer so I asked my 11yr old daughter what two million divided by eight was. I was expecting her death stare. Instead, “Wait, I got this dad.” She pulled out her phone to use the calculator. Sigh... China can produce about 2m High Bandwidth Memory stacks in 2026. It takes eight stacks per Huawei AI accelerator (think GPU). My daughter says thats 250,000 (AI accelerators that China can churn out a year).
Results up front
This is the supply-ceiling article, the one where the hunt from Part 4 ends at a number. It is also, deliberately, the most checkable thing in this series: every load-bearing claim below carries a ledger ID, a confidence grade, and a falsification trigger. Everything is the model as of June 2026; imagery claims are stamped to collection dates, document claims to filing dates, and a built shell is never counted as an operating fab.
China’s AI hardware has one binding constraint, and it is not the one in the headlines. It is not 7nm logic wafers: SMIC has 24–60× more leading-edge logic capacity than its AI accelerators can consume (AC-02, high confidence). It is not advanced packaging, which is comparable to the constraint but not tighter (AC-03, med-high). It is HBM (high-bandwidth memory, the specialized stacked memory every serious AI chip must be packaged with), and specifically the unglamorous back-end step of stacking DRAM dies into known-good stacks (AC-01, high).
The arithmetic: China can make roughly 2 million good HBM stacks in 2026 (a triangulated floor, shown below) and each Huawei Ascend 910C consumes 8 of them. Two million divided by eight is 250 thousand accelerators. Run the division probabilistically and the 2026 ceiling is ~225k 910C-equivalents at the median (P10–P90: 192–270k), with the widely quoted ~250k sitting at about the 75th percentile. By 2028 the median climbs to ~1.0M (P10–P90: 821k–1,229k) (AC-15, med-high). Real growth, and memory-bound the entire way.
For comparison, JP Morgan estimates NVIDIA will ship ~7.5M AI GPUs in 2026.
Two consequences a decision-maker should carry out of this post. First, deployed is not domestic: the ~700–805k accelerators China fielded in 2025 rode a one-time, decaying overhang of pre-ban foreign silicon, not domestic capacity. Second, Huawei’s reported ~600k target for 2026 is roughly twice what the domestic HBM division allows. On these numbers it is unreachable from domestic supply. Watch whose number breaks.
The rest of the post shows the work: the chain walk, the Monte-Carlo, the three independent evidence lines under the 2-million figure, the overhang arithmetic, the actual physics of the chokepoint, and the heaviest receipts block in the series.
The fab-counting fallacy
Most coverage of China’s chip push counts fabs, and every new cleanroom reads as “China is catching up.” A fab count answers the wrong question.
In any industrial mobilization the question is the military one: what is the LIMFAC (the limiting factor, the one input that caps output no matter how much of everything else you have)? A convoy moves at the speed of its slowest ship. I cataloged 59 fabs and 59 chip designers, then spent most of the project hunting for the slowest ship.
An AI accelerator needs three things made at scale: a leading-edge logic die, HBM stacks, and the advanced packaging that marries them. The ceiling is the minimum of the three. The whole analytical game is figuring out which term is the minimum, because that term, and only that term, sets the number.
How to read it: logic dies (green) and HBM front-end wafers (orange) both funnel through the HBM stacking back-end (red, the constriction), then 2.5D packaging, to become Ascends; band widths are schematic. What to see: the narrowest point of the funnel is not a wafer fab. It is a packaging step. HuaweiFabHunt fab atlas, June 2026.
Walking the chain
Logic is ample. This is the counterintuitive one, so here is arithmetic you can redo. A 910C carries two compute dies of roughly 600–800 mm² each. A 300mm wafer offers ~70,000 mm² of usable area. After edge loss that is roughly 75–105 die candidates per wafer at that size; call it 80–90. Apply the honest domestic-yield discount (30–60% good dies, the band this model carries for SMIC’s DUV-based 7nm-class process) and you get roughly 25–50 good dies per wafer, or 12–25 finished accelerators per wafer. Feeding the entire HBM-limited Ascend volume (~19,000 accelerators a month at the 2026 median, 225k a year ÷ 12) therefore takes roughly 1–2k wafers per month in 2026, rising toward ~4k by 2028 as the HBM supply grows, against SMIC’s ≤7nm capacity of roughly 45–110k wpm (wafers per month) across the same window. That is 24–60× headroom (AC-02). China could lose most of its leading-edge logic capacity and the AI-accelerator number would not move. And the headroom is widening, not closing: a second ≤7nm source is reportedly emerging at HLMC’s Fab 6 on domestic SiCarrier tools (Reuters, 2026-03-16, single-source; AC-17, medium). Strategically loud, numerically irrelevant, because logic was never the gate.
Packaging is comparable, not tighter. The 2.5D step (mounting the logic die and its memory stacks side-by-side on an interposer, the step TSMC’s CoWoS made famous) looks like a chokepoint: SJSemi in Jiangyin is the lone domestic volume silicon-2.5D node. But run the throughput against the constraint: the HBM-fed accelerator volume at the ~250–300k upper band (the right stress test for whether packaging binds) is only ~20–25k packages a month, and SJSemi plus the ramping OSATs (JCET’s XDFOI line in mass production but largely committed to international customers; Tongfu and Huatian pre-volume) sit at roughly that same order. Comparable, with little slack: a genuine secondary gate, not the binding one (AC-03). And Huawei is actively de-risking it from the design side: the 910C uses a simpler domestic dual-interposer scheme rather than true CoWoS-class, and the Ascend 950 is monolithic, a single compute die that skips advanced packaging entirely. Packaging rides alongside the constraint. It is not the constraint.
Memory binds. The entire Chinese AI-accelerator buildout reduces to one division a leader can do in their head: about 2 million good HBM stacks in 2026, and every Ascend 910C eats 8 of them. Two million divided by eight is 250 thousand accelerators.
That is the ceiling. Not wafers. Not money. Memory.
One corollary worth a sentence before moving on: the seizure fantasy fails on the same arithmetic. Nationalizing the seven foreign-owned fabs on Chinese soil (TSMC Nanjing, SK Hynix Wuxi, Samsung Xi’an and the rest) adds approximately zero AI compute, because not one of them runs an HBM line or holds an EUV tool (AC-04, high; I swept for this explicitly). What seizure changes, and what it costs, is Part 9 and Part 10 territory.
The division, run honestly
“About 2 million divided by 8” is a headline, not a model. The model is a Monte-Carlo, 100,000 draws over the four assumptions the division actually rests on: the HBM good-stack supply itself (with the 2026 distribution deliberately skewed low to absorb CXMT’s HBM3 slip; more on that below), the residual back-end yield (±10%), the stacks-per-accelerator count (7.5–9.5, centered on the 910C’s 8, because the fleet is not all 910Cs), and the timing of second HBM sources (the SwaySure/XMC upside, which matters for 2028, not 2026).
The output: 2026 lands at ~225k 910C-equivalents at the median, P10–P90 192–270k. The ~250k figure you have seen quoted (including by me, in earlier work) sits at roughly the 75th percentile of this distribution; the median slid below it when the CXMT yield slip got priced in. The 2028 median is ~1.0M, P10–P90 821k–1,229k, with second-source timing rising to the number-two driver and contributing most of the extra width relative to 2026. HBM supply still dominates. The band is the right way to read the result.
How to read it: the fan is the P10–P90 envelope of the ceiling by year; the tornado bars show how much each input assumption swings the answer. What to see: HBM-stack supply dominates the uncertainty in every single year (a ~125k swing in 2026, ~590k in 2028), dwarfing yield, stack-count, and everything else. The ceiling’s uncertainty IS HBM-supply uncertainty. That is the binding constraint, quantified. (Tornado panels are drawn for 2026 and 2028; their P50 labels are the at-mode baselines of the one-at-a-time sweep, a touch off the headline medians.) HuaweiFabHunt Stage-45 Monte-Carlo (100k draws), June 2026.
The tornado chart is the receipt for the whole thesis, which is why it gets pride of place. If logic or packaging were secretly the gate, their bars would dominate somewhere. They dominate nowhere. The uncertainty in China’s AI-accelerator output is, in every year, almost entirely uncertainty about how many good HBM stacks China can make.
I spent nine evidence modalities and a few too many nights working to 0’dark thirty earning the right to do one long division. The tornado chart is what earning it looks like.
How I know two million
The 2-million figure deserves suspicion. It is the load-bearing number of this entire series, so here is where it comes from. Three free, mutually independent evidence lines converge on ~1.5–3M good stacks for 2026, and I treat ~2M as a floor inside that band.
Line one: the wafer allocation. CXMT, China’s #1 DRAM maker and the binding domestic HBM source, is multiply reported (Korean trade press, picked up across the technical outlets, independent of any single analyst) to be dedicating ~60k of its ~300k wafers per month to HBM3 in 2026. Run that through dies-per-wafer and stacking yield and you land in the low millions of stacks.
Line two: the tools. Good stacks require die-stacking bonders, and the bonder market is brutally concentrated: Hanmi of Korea holds ~90% of HBM3E-grade thermo-compression bonders and halted shipments to Chinese customers in May 2025. The domestic challengers (Naura, Suzhou Maxwell, U-Precision) are at validation stage; U-Precision’s own IPO prospectus shows wafer-level-bonding revenue under ¥30M a year: prototype scale. Anchor the installed base to what shipped before the halt, derate for yield and uptime, and the tool-side estimate centers on ~1.5–3M stacks. The machines China has bound the answer.
Line three: the dog that didn’t bark. CXMT’s own STAR-market IPO prospectus (a legal document with liability attached, cleared May 2026) discloses no HBM product line at all. DDR4, DDR5, LPDDR5: yes. HBM: forward-looking R&D only. China’s flagship memory maker, raising billions, does not yet claim HBM as a product. That is consistent with the ceiling, not against it: the prospectus (filed December 2025) predates the ramp, and the ~2M is an allocation-derived 2026 ceiling on a line still ramping, not a count of product already shipped. But a business already shipping millions of stacks does not hide from its own prospectus.
How to read it: four estimates of China’s 2026 good-stack supply: two external (SemiAnalysis’s supply-chain estimate, 1.5–2.5M, which embeds the wafer-allocation input; the IAPS bottom-up, 2.5–18.6M), this project’s own Stage-45 Monte-Carlo band (1.54–2.16M; my model, not an independent check), and the new free Stage-49 tool-vendor proxy (1.5–3M). What to see: three of the four cluster at ~1.5–3M; the one ~7M-centered bottom-up stands apart. HuaweiFabHunt Stage-49 reconciliation, June 2026.
The outlier, stated fairly. One serious independent bottom-up estimate (IAPS) puts the 2026 median at ~7M stacks, with an honest and very wide 80% confidence interval of 2.5–18.6M. I do not dismiss it. It is the only fully independent bottom-up construction besides this one, and its existence is why I quote 2M as a floor. But it is the outlier: it sits at odds with the tool-vendor and back-end evidence, and it partly measures wafer capacity rather than good stacks, the distinction the next section is about. Note the asymmetry, though: no free line centers below the 1.5–3M band. Nothing in the free evidence pushes the floor down. If the truth is closer to 7M, the ceiling loosens upward and China’s 2026 is better than I model. The floor never tightens.
And the divisor is confirmed. The ÷8 is no longer a bare assumption. Teardown-derived analysis of the fielded accelerator base works out to ~13M stacks across ~1.6M packages: almost exactly 8 stacks per package, independently confirming the bridge from stacks to accelerators (AC-01 corroboration, stage-49).
Sorted by evidence grade, then: the 8-stack divisor is teardown-confirmed; the ~2M floor is free-triangulated across three independent lines; the exact per-stack yield bridge (the one number that would turn the band into a point) is genuinely unobserved in public sources. That sorting is itself a finding.
Deployed is not domestic
Here is the trap in every headline accelerator count: China fielded roughly 700–805k accelerators in 2025 (AC-06, high). That is three times my 2026 ceiling. Contradiction? No. Almost none of that wave reflects what China can build in a year. It rode a one-time, decaying overhang, and the overhang has dates on it:
A ~2.9M die-bank of pre-ban, TSMC-fabricated Ascend dies, banked through a shell-company channel before the foundry door closed, exhausted by early 2026 (AC-06).
An estimated 5–13M imported HBM stacks (roughly 1.0–1.6M accelerators’ worth of memory) bought ahead of the December 2024 HBM controls and exhausted around end-2025 (AC-07, med-high).
Over 1M NVIDIA H20s, the export-compliant China chip, before that channel closed in April 2025, plus an estimated ~140k/yr of smuggled NVIDIA parts that continue to leak in. Both are policy stories, and Part 10 owns them; here they are just overhang line-items.
The engineering tell is in the teardowns. Every publicly torn-down 910C is TSMC silicon (2020-vintage dies) wearing Samsung and SK Hynix HBM2E: pre-ban logic, foreign memory. No all-domestic unit (SMIC die plus Chinese HBM) has surfaced in any public teardown. The day one does, this assessment changes; that is trigger 5 below.
So 2026 is the crossover year: shipments fall out of the dead overhang and land on the rising ~225k domestic ceiling. That is why Huawei’s reported ~600k target for 2026 is, on these numbers, unreachable from domestic supply, more than double what the HBM division allows (AC-06 + stage-43/45). Either the target quietly slips, or the overhang was deeper than the evidence shows, or something on my trigger list fires. One of those three. Watch whose number breaks.
One more reason the ceiling is what it is: it is shared. Every domestic training-capable accelerator in the designer census is gated on the same HBM and interconnect supply (AC-08, high). When Cambricon claims a ~500k 2026 ramp, that claim divides the same stack pool Huawei draws from; it does not grow it. Merchant-cohort headlines are non-additive (AC-09).
The hard step, for the engineers
Why is a memory-packaging step, not lithography, the gate? Because of what “good stack” physically means. Three terms, defined once, carry the whole section.
A TSV (through-silicon via) is microscopic vertical wiring: thousands of holes etched through each DRAM die, lined and filled with copper, so dies can be stacked like floors of a building and wired straight through. An HBM stack is 8–12 DRAM dies on a base die, TSV’d, thinned to tens of microns (thinner than a human hair) and then welded die-to-die under heat and pressure by a TCB (thermo-compression bonding) machine, which must land thousands of micro-bumps per die at micron-scale alignment without cracking silicon that thin. The output that matters is KGSD (known-good stacked die): a completed stack that still tests healthy after all of that.
The brutal part is that KGSD yield compounds: roughly per-die yield raised to the stack height, times bond yield per layer. Stack eight dies at 95% per-layer success and you keep about two-thirds of your stacks; at 90% you keep less than half. Small process problems become large output problems. This is why “wafers allocated to HBM” and “good stacks shipped” are different numbers, and why the IAPS estimate and mine can both be honest while disagreeing 3×.
It is also why the export controls that matter here are tool controls. The TCB machines that determine KGSD yield at HBM3 grade are made by a handful of foreign firms, dominated by one Korean company that stopped shipping to China in May 2025; the domestic replacements are demonstrably at validation stage (one flagship domestic tool was shown as “process-verified” at SEMICON China in March 2026: a milestone, not a production fleet). A bonder ban bites years before a lithography ban, because the installed base is smaller and the yield learning-curve is steeper. As of June 2026, CXMT’s HBM3 mass production has slipped to 2H-2026/2027 with yields reported below 50% (CN-sourced reporting). That is the slip that moved my 2026 median below the old ~250k headline.
And here is the honest limit of my own favorite tool: none of this is visible from orbit (AC-12, high). Two independent imagery-analyst passes over every relevant site reached the same verdict: shells, utility yards, and cooling plant resolve beautifully; the stacking step happens inside windowless cleanrooms and is permanently imagery-opaque. The binding numbers in this post come from filings, tool-vendor evidence, and teardowns. The satellites contribute the where and the when:
How to read it: two cleanroom-scale halls in late fit-out (west roof complete, east still under cranes) with the utility plant rising to the north. What to see: the single most important building in this series, mid-build. This is the dedicated HBM stacking plant the ceiling turns on. It is the freshest frame in this series. Imagery © SkyFi (Maxar WorldView-1, 0.57 m), as imaged 2026-06-01.
The facility chain, compressed (full workups in the Part-4 appendix): CXMT Hefei makes the DRAM that gets stacked: the binding source, with the slipping HBM3 line (AC-11, med-high). Innotron Shanghai (CXMT’s parent) is building the dedicated stacking back-end above: a ¥17.1bn (US$2.4bn) plant designed for ~30k packages a month, start-up target slipped from July to end-2026, good-stack ramp realistically 2027+. SwaySure in Shenzhen, the second source, has been sampling 24-layer HBM to Huawei Cloud since 2H-2025 but showed no volume-cleanroom signature as imaged 2026-01-25. Volume is a 2027+ story. XMC in Wuhan runs the consortium pilot (~3k wafers/month, with a much larger Phase-III planned). Every link in that chain is real, funded, and building. None of the second sources or the dedicated back-end ships volume in 2026. The ~2M good stacks are what CXMT’s allocated wafers can yield as its HBM3 line ramps through the year: a ceiling, not a shipped-product count. Built shell is not output.
What would change my mind
These are falsifiable triggers, not hedges: the five things that break the ceiling upward, each with its status as of June 2026. None has fired.
A domestic HBM-grade bonder qualifying at volume yield. Status: validation-stage across all three domestic challengers; earliest plausible volume 2027+.
CXMT HBM3 mass production landing in 2026 at healthy yield. Status: slipped to 2H-2026/2027; yields reported below 50%. This is the single biggest swing factor in the tornado.
SwaySure or XMC shipping volume HBM before 2027. Status: sampling and pilot, respectively.
Confirmed several-hundred-thousand-unit 950-series shipments fed by Huawei’s in-house HBM at volume. Status: the 950PR is reported in mass production since April 2026 with a sold-out order book (ByteDance ~350k, Alibaba ~200k). But every figure there is demand-side, the same class of claim as the 600k target. The in-house HiBL memory feeds the 950 series, not the 910C, and its volume is unverified. This is the one to watch by Q3 2026.
A public teardown of a current-build, all-domestic 910C (SMIC die plus Chinese HBM). Status: no such unit has surfaced.
Any one of these fires, I re-run the Monte-Carlo and publish the new band. The model is a loop, not a press release.
Where this goes
This post is the middle of the argument, and the arithmetic radiates outward from it. Who wrote the checks that built all this capacity (local governments out-spending Beijing’s famous Big Fund 1.39:1, as equity, not grants) was Part 5. What the ceiling looks like from the demand side (the data-center halls China is building for chips that cannot arrive on time) is Part 7. How long the memory wall stands, and what China’s own nine-year logic ramp says about when it falls, is Part 8. And which export controls actually built the wall (versus the ones that leaked) is Part 10.
The compressed mental model to carry forward: China’s AI hardware program is not blocked. It is taxed. And the tax is collected at exactly one tollbooth. Eight stacks per chip is not a metaphor. It is a bill of materials, and in 2026 the bill exceeds the budget.
Receipts
This is the heaviest receipts block in the series, on purpose. Full ledger is on my substack.
Load-bearing claims in this piece:
HBM binds; 2026 ceiling ~225k 910C-eq (P50), P10–P90 192–270k; ~250k ≈ P75: AC-01 (high) + AC-15 (med-high, 2028 ~1.0M, 821k–1,229k); 100k-draw Monte-Carlo over stack supply, yield, stacks-per-chip, second-source timing; tornado shows HBM-stack supply dominating every year.
~2M good stacks/2026 is a triangulated floor; three free lines converge 1.5–3M: AC-01 note; wafer-allocation reporting + tool-vendor installed base + CXMT prospectus negative disclosure; the IAPS ~7M bottom-up is the outlier (80% CI 2.5–18.6M), treated as upside, not refuted.
÷8 stacks per accelerator, teardown-confirmed (~13M stacks / ~1.6M packages ≈ 8): AC-01 corroboration, stage-49 teardown-secondary.
Logic non-binding, 24–60× headroom (~1–4k wpm needed of SMIC’s ~45–110k ≤7nm wpm): AC-02 (high); the HLMC Fab-6 second source is AC-17 (medium, single-source Reuters 2026-03-16).
Packaging comparable, not tighter; 950 monolithic de-risk: AC-03 (med-high)` + W4 indicator.
Deployed ≠ domestic; 2025’s ~700–805k rode a decaying overhang (die-bank ~2.9M, gone early-2026; imported HBM ~5–13M stacks, gone ~end-2025; H20 closed Apr-2025): AC-06 (high), AC-07 (med-high)
TechInsights-derived teardown reporting.
Merchant cohort divides the pool, non-additive: AC-08/AC-09 (high); every train-capable designer in the census is HBM-gated.
CXMT = binding source, HBM3 slipped, prospectus discloses no HBM line: AC-11 (med-high) + W5 indicator; Hanmi bonder halt May-2025, domestic tools validation-stage
No HBM/2.5D back-end is imagery-resolvable at any site: AC-12 (high); two independent analyst passes, both negative, which is why the binding numbers are filings-and-tools-derived, never pixels.
Nationalizing the 7 foreign-owned fabs adds ~0 AI compute: AC-04 (high); none runs HBM or EUV.
What would change my mind: the five numbered triggers above, with statuses as of June 2026; the refresh loop re-runs the Monte-Carlo when any trigger fires.
Every imagery claim is stamped to its collection date; document claims to filing dates; “as of June 2026” is the series freshness stamp. Imagery © SkyFi (Maxar WorldView-1) and Maxar/Vantor as credited per frame. Charts: HuaweiFabHunt Stages 33–35, 45, 49, 56, June 2026.





