share_log

Hua Chuang Securities: Surge in Token Inference Drives Massive Demand for Domestic Computing Power

Zhitong Finance ·  Mar 3 16:06

The normalization of inference workloads signifies a shift in computing power demand from 'periodic training investment' to 'continuous inference consumption,' reflecting a systematic upward trend in the value proposition of the computing chip industry.

According to Zhitong Finance APP, Huachuang Securities issued a research report stating that the exponential increase in Token consumption is driven by a fundamental shift in users' AI usage patterns. The continuous expansion of computing power demand is directly driving the growth of demand for computing chips, with China’s GPU industry entering a high-growth phase. The normalization of inference workloads implies a shift in computing power demand from 'periodic training investment' to 'continuous inference consumption,' and the value center of the computing chip industry is showing a systematic upward trend. Against the backdrop of normalized inference workloads and restricted supply of advanced overseas computing power, insufficient supply elasticity and the acceleration of domestic substitution are creating resonance, concentrating industry profits in chip design and core computing segments.

The main viewpoints of Huachuang Securities are as follows:

Event: From February 9 to 15, 2026, data from OpenRouter shows that Chinese models surpassed U.S. models for the first time with a usage volume of 4.12 trillion Tokens compared to 2.94 trillion Tokens during the same period. From February 16 to 22, four out of the top five most utilized models on the platform were from Chinese manufacturers: MiniMax's M2.5, Moonlit Side's Kimi K2.5, Zhipu's GLM-5, and DeepSeek's V3.2. These four models collectively accounted for 85.7% of the total Top 5 utilization volume.

Demand Side: Computing chips enter a phase of normalized high load.

The exponential rise in Token consumption appears to result from user base expansion and increased usage duration, but its deeper driver lies in a fundamental shift in AI usage patterns. AI is evolving from a 'question-and-answer tool' providing simple information and engaging in casual conversation to a 'productivity tool' capable of deeply participating in workflows and handling complex tasks.

NVIDIA CEO Jensen Huang pointed out that without computing power, Tokens cannot be generated; without Tokens, revenue growth cannot be achieved. The ongoing expansion of computing demand is directly boosting the demand for computing chips, and China’s GPU industry has entered a high-growth phase. According to Frost & Sullivan, the market size of China's AI acceleration chips is expected to grow from 142.537 billion yuan in 2024 to 1,336.792 billion yuan in 2029, with an average annual compound growth rate of 53.7%.

In terms of market segmentation, GPUs exhibit the fastest growth rate, with their market share expected to increase from 69.9% in 2024 to 77.3% in 2029, reaching a market size of 1,033.34 billion yuan. Overall, the normalization of inference workloads signifies a shift in computing power demand from 'periodic training investments' to 'continuous inference consumption,' with the value center of the computing chip industry showing a systematic upward trend.

Supply Side: Strengthened overseas constraints and enhanced domestic substitution capabilities.

1) Overseas supply constraints: Policy restrictions combined with production bottlenecks have continuously limited supply elasticity. According to Huanqiu.com citing market reports, the U.S. Department of Commerce stated that NVIDIA has yet to sell any H200 chips to China since approval for AI chip exports to China was granted two months ago. Although the U.S. Department of Commerce issued new rules in January easing export restrictions on H200 chips to China, the U.S. State Department remains committed to pushing for stricter export controls. Amidst multiple uncertainties, Chinese customers have refrained from placing orders with NVIDIA until licensing conditions become clear.

Beyond policy disruptions, global computing hardware is also in a state of structural tight balance, with current delivery cycles for data center GPUs ranging between 36 to 52 weeks. Overall, short-term supply elasticity is unlikely to improve significantly, and extended delivery cycles are becoming the norm. Persistent overseas computing supply gaps are creating a window of opportunity for domestic computing alternatives and accelerated localization of the supply chain.

2) Enhanced capability of domestic chip substitution: Synchronized leaps in performance and commercialization. The sustained increase in Token consumption fundamentally reflects higher frequency and intensity of large model inference calls. The FP8 metric is emerging as the next-generation computing standard, essentially trading off some precision to achieve greater overall computational performance. On February 12, Moore Threads publicly announced that the FP8 computing power of its S5000 single card exceeded 1,000 TFLOPS for the first time, with training accuracy closely matching NVIDIA's H100, with a gap of less than 1%.

Domestically produced chips are actively promoting ecosystem adaptation. From December 2025 to March 2026, Muxi Corporation's C500/C550 series have gradually adapted to multiple domestic large-scale models, including Tencent HunYuan Image 3.0, Step-Step Star Step3.5flash, and Zhipu GLM-5. As model adaptation capabilities continue to improve, domestic GPUs are transitioning from technical feasibility to scalable usability. On February 27, Cambricon Technologies, Moore Threads, and Muxi Corporation successively released their 2025 annual performance reports, with all three companies achieving triple-digit revenue growth. Notably, Cambricon Technologies achieved its first annual net profit turnaround.

Overall, domestically produced GPUs are making breakthroughs in three dimensions: computing power density, ecosystem adaptation, and commercial realization. Domestic computing power chips are progressively achieving scaled substitution.

In terms of investment targets

Recommended focus areas: (1) Chip design: Cambricon Technologies (688256.SH), Hygon Information Technology (688041.SH), Muxi Corporation (688802.SH), Moore Threads (688795.SH), Tianshu Zhixin (09903); (2) Chip manufacturing: SMIC (688981.SH), Hua Hong Company (688347.SH)/Hua Hong Semiconductor (01347); (3) Servers and peripherals: Inspur Information (000977.SZ), Huafeng Technology (688629.SH).

Risk Warning

Downstream demand falling short of expectations, slower-than-expected progress in domestic substitution, risks associated with foundry supply.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment