Source: Brocade
Authors: You and Su Yang.
DeepSeek is driving an explosion in demand for inference.$NVIDIA (NVDA.US)$The "computing power hegemony" has been pierced, and the door to a new world is gradually opening—led by ASIC Chip-driven computing revolution, moving from silence to noise.
Recently, the Chip Flow Research Institute cited sources saying that DeepSeek is preparing to develop AI Chips independently. Compared to this rising star, domestic giants such as $Alibaba (BABA.US)$ 、$Baidu (BIDU.US)$ByteDance has already crossed the threshold of "independent research and development."
Across the ocean, OpenAI's new progress in self-developed chips was also released earlier this year, as reported by foreign media.$Broadcom (AVGO.US)$The first chip customized for it will be taped out in a few months.$Taiwan Semiconductor (TSM.US)$.
Earlier, there were even rumors that Sam Altman plans to raise 7000 billion dollars to build a "chip empire," covering both design and manufacturing. $Alphabet-C (GOOG.US)$ 、$Amazon (AMZN.US)$、$Microsoft (MSFT.US)$、 $Meta Platforms (META.US)$ They have also successively joined this "self-research frenzy."
A clear signal is that whether it is DeepSeek, OpenAI, or companies in China and Silicon Valley, no one wants to fall behind in the era of computing power. ASIC chips may become their ticket to cross the doorway of Shanghai New World.
Will this 'kill' NVIDIA? Or will it 'create' a second NVIDIA? There is still no answer.
However, it is certain that this sensational wave of 'self-research' has already made upstream industry chain companies aware, such as Broadcom, which provides design customization services to major manufacturers, their performance has already 'taken off': AI business revenue is expected to grow 240% year-on-year in 2024, reaching 3.7 billion USD; in Q1 2025, AI business revenue is expected to be 4.1 billion USD, a year-on-year increase of 77%; 80% of this comes from ASIC chip design.
In Broadcom's view, the cake of ASIC chips is valued at over 90 billion USD.
1. From GPU to ASIC, the economics of computing power is reaching a watershed.
Low cost is a necessary condition for the explosion of AI inference, while the corresponding factor is that general-purpose GPU chips have become the golden shackles of AI explosion.
NVIDIA's H100 and A100 are the absolute kings of large model training, and even the B200 and H200 have attracted the interest of technology giants. The Financial Times previously cited data from Omdia indicating that in 2024, NVIDIA's Hopper architecture chip's main clients include Microsoft, Meta.$Tesla (TSLA.US)$/xAI and others, among which Microsoft’s order volume reached 0.5 million units.
However, as the absolute dominator of general-purpose GPUs, NVIDIA's product solutions are gradually revealing their "other side": high costs and excessive energy consumption.
In terms of cost, a single H100 has a price exceeding 0.03 million dollars, training a hundred billion parameter model requires tens of thousands of GPUs, plus subsequent investments in network hardware, storage, and security, totaling over 0.5 billion dollars. According to HSBC data, the latest generation GB200 NVL72 solution costs over 3 million dollars per rack, while NVL36 is around 1.8 million dollars.
It can be said that model training based on general-purpose GPUs is too expensive; however, in Silicon Valley, where computing power is not constrained, there remains a tendency towards the narrative of "powerful bricks flying", and capital expenditure has not slowed down. Just recently, xAI, owned by Musk, announced that the scale of the servers for Grok-3, disclosed not long ago, has already reached 0.2 million GPUs.
Hyperscale datacenter operators are expected to exceed 200 billion dollars in capital expenditure (CapEx) in 2024, with this number projected to approach 250 billion dollars by 2025, and most resources will be directed towards AI.
In terms of energy consumption, according to estimates by SemiAnalysis, a 0.1 million card H100 cluster has a total power consumption of 150MW, consuming 1.59TWh of electricity per year, which at 0.078 dollars per kilowatt-hour amounts to an annual electricity cost of up to 0.1239 billion dollars.
In contrast to the data published by OpenAI, the GPU utilization rate during the inference phase is only 30% to 50%, with significant "waiting while computing" phenomena; such low performance utilization truly represents a waste in the inference era, resulting in excessive inefficiency.
Leading in performance and expensive in price, with poor efficiency, coupled with ecological barriers, the industry has been saying 'The world has suffered from NVIDIA for a long time'—cloud vendors are gradually losing hardware autonomy, compounded by supply chain risks, and AMD is still not able to rise up, various factors are forcing giants to start independently developing ASIC dedicated chips.
Since then, the AI Chip battlefield has shifted from a technological competition to an economic game.
As concluded by Southwest Securities, 'When the model architecture enters the convergence period, every dollar spent on computing power must deliver quantifiable economic returns.'
From the progress recently reported by North American cloud vendors, ASIC has shown certain advantages in substitution:
Google: The TPU v5 chip customized by Broadcom for Google reduced the unit computing cost in the Llama-3 inference scenario by 70% compared to H100.
Amazon: The AWS Trainium 3, with a 3nm process, has an energy consumption of only 1/3 of general GPUs under the same computing power, saving more than ten million dollars in electricity costs annually; it is reported that the shipment volume of Amazon Trainium chips in 2024 has exceeded 0.5 million units.
Microsoft: According to IDC data, after self-developing ASICs for Microsoft Azure, the hardware procurement cost ratio dropped from 75% to 58%, breaking free from the long-term passive bargaining dilemma.
As the largest beneficiary of the North American ASIC chain, Broadcom's trend is becoming more evident in the data.
Broadcom's AI business revenue for 2024 is 3.7 billion USD, an increase of 240% year-on-year, with 80% coming from ASIC design services. In Q1 2025, its AI business revenue is expected to be 4.1 billion USD, up 77% year-on-year, while the AI revenue for the second quarter is projected to be 4.4 billion USD, an increase of 44% year-on-year.
As early as during the annual report period, Broadcom guided that ASIC revenue would explode by 2027, painting a picture for the market that ASIC chips could achieve a market size of 90 billion USD in three years. During the Q1 conference call, the company reiterated this point.
With this major industry trend, Broadcom has also become the third semiconductor company in the world to reach a market cap of over 1 trillion USD, following NVIDIA and Taiwan Semiconductor, while also drawing attention to companies like Marvell and AI Chip overseas.
However, one point needs to be emphasized - "ASICs are good, but they will not kill GPUs."
Microsoft, Google, and Meta Platforms are all engaged in self-research, but at the same time, they are competing for the launch of NVIDIA B200, which actually indicates that the relationship between both parties is not a direct competition.
A more objective conclusion should be that GPUs will continue to dominate the high-performance training market, while GPUs will remain the main chip in inference scenarios due to their versatility. However, in the future nearly $400 billion AI Chip blue ocean market, the penetration path for ASICs is becoming increasingly clear.
IDC predicts that from 2024 to 2026, the share of ASICs in inference scenarios will increase from 15% to 40%, representing up to $160 billion.
The ultimate outcome of this transformation may be: ASICs taking over 80% of the inference market, while GPUs retreat to training and graphics fields. The real winners will be those "dual-ecosystem players" who understand both chip design and application scenarios. NVIDIA is clearly one of them, and being Bullish on ASICs does not equate to being Bearish on NVIDIA.
The guide to the new world is to seek out dual-ecosystem players beyond NVIDIA and explore how to capitalize on the new ASIC era.
2. The "scalpel" of ASICs: cut away all non-core modules.
Users of CPUs and GPUs are already familiar, the FPGA application market is niche, and the most unfamiliar is ASIC.
Characteristics | CPU | GPU | FPGA | ASIC |
Degree of Customization | General | Semi-General | Semi-Custom | Fully Custom |
Flexibility | High | High | High | Low |
Cost | Relatively Low | High | Higher | Low |
Power Consumption | Higher | High | Higher | Low |
Main Advantages | Most versatile. | Strong computational power, mature ecosystem. | High flexibility. | Highest energy efficiency. |
Main disadvantages. | Weak parallel computing power. | Higher power consumption, programming difficulty is greater. | Weak peak computing power, programming difficulty is higher. | Long development time, high technical risks. |
Application Scenarios. | Rarely used in AI. | Cloud training and inference. | Cloud inference, terminal inference. | Cloud training and inference, terminal inference. |
Figure: Comparison of computing power chips Source: Zhongtai.
So, it is said that ASIC is Bullish for AI inference, what kind of chip is it actually?
From an architectural perspective, general-purpose chips like GPUs are limited by the "single against hundreds" design – needing to cater to diverse demands such as graphic rendering, scientific computation, and different model architectures, resulting in a significant waste of transistor resources on non-core functional modules.
The biggest feature of NVIDIA GPUs is their numerous "small cores," which can be likened to the multiple engines of a Falcon rocket; developers can smoothly, efficiently, and flexibly utilize these small cores for parallel computing thanks to the years of accumulated operator library from CUDA.
However, if the downstream model is relatively certain and the computational tasks are relatively fixed, there is no need for so many small cores to maintain flexibility; this is precisely the principle behind ASICs, which is why they are also referred to as fully customized high-performance chips.
Through "surgical knife" precision trimming, only the hardware units that are strongly related to the target scene are retained, releasing incredible efficiency, which has already been validated in products from Google and Amazon.

For GPUs, the best tool to call them is NVIDIA's CUDA, while for ASIC chips, they are invoked by cloud vendors' self-developed algorithms, which is not a difficult task for large enterprises originating from software.
In Google TPU v4, 95% of the transistor resources are used for matrix multiplication units and vector processing units, optimized specifically for neural network computations, whereas similar units in GPUs account for less than 60%.
Unlike the traditional von Neumann architecture’s "compute-storage" separation model, ASIC can customize data flow around algorithm features. For instance, in the recommendation system chip customized by Broadcom for Meta, the computing units are directly integrated around the storage controller, reducing the data movement distance by 70% and latency down to 1/8 of a GPU.
For the 50%-90% weight sparsity characteristics in AI models, the Amazon Trainium2 chip embeds a sparse computing engine that can skip zero-value calculations, theoretically increasing performance by 300%.
When algorithms tend to be fixed and for deterministic vertical scenarios, ASIC has a natural advantage. The ultimate goal of ASIC design is to make the chip itself a "physical embodiment" of the algorithm.
In both historical context and ongoing reality, we can find strong evidence of ASIC success, such as mining machine chips.
Initially, the industry used NVIDIA's GPUs for mining, but as mining difficulty increased and electrical consumption surpassed mining revenue (very similar to the current inference demand), dedicated mining ASIC chips surged. Although not as versatile as GPUs, mining ASIC maximized parallelism.
For example, Bitmain's$Bitcoin (BTC.CC)$mining ASICs deploy tens of thousands of SHA-256 hash computation units simultaneously, achieving super-linear acceleration under a single algorithm, with computational density reaching over 1000 times that of a GPU. Not only has the dedicated capability been greatly enhanced, but energy consumption has also achieved system-level savings.
In addition, using ASICs can simplify peripheral circuits (such as eliminating the complex protocol stack of PCIe interfaces), reducing the Main Board area by 40% and cutting the overall machine cost by 25%.
Low cost, high efficiency, and support for deep integration of hardware and scenarios, these ASIC technology cores naturally adapt to the AI industry's transformation needs from 'brute force computing' to 'refined efficiency revolution.'
With the advent of the inference era, the cost advantage of ASICs will replay the history of mining machines, achieving a 'death cross' under scale effects - although the initial R&D costs are high (the single-chip design fee is about 50 million USD), its marginal cost decline curve is much steeper than that of general-purpose GPUs.
Taking Google's TPU v4 as an example, when the shipment volume increases from 0.1 million units to 1 million units, the cost per unit drops drastically from 3,800 USD to 1,200 USD, a decrease of nearly 70%, while the cost reduction for GPUs typically does not exceed 30%. According to the latest information from the Industry Chain, Google’s TPU v6 is expected to ship 1.6 million units in 2025, with a per-chip computing power increase of three times compared to the previous generation, and the cost-effectiveness of ASICs is rapidly improving.
This leads to a new topic; can everyone rush into the self-developed ASIC trend? It depends on the cost of self-development and the demand.
According to calculations for 7nm process ASIC inference acceleration cards, the initial tape-out costs, including IP licensing fees, labor costs, design tools, and mask templates, could reach the level of hundreds of millions, not including later mass production costs. In this regard, large companies have a financial advantage.
Currently, cloud service providers like Google and Amazon have mature customer systems, allowing them to form a closed loop of R&D and sales, giving them an inherent advantage in self-development.
Companies like Meta have the logic of self-development rooted in the massive internal demand for computing power. Earlier this year, Zuckerberg revealed plans to launch around 1 GW of computing power by 2025 and to have over 1.3 million GPUs by the end of the year.
The value of the "new map" far exceeds 100 billion dollars.
Just the mining demand has brought nearly 10 billion dollars to the market, so when Broadcom predicts that the AI ASIC market space will reach 70-90 billion dollars by the end of 2024, we are not surprised, and we even believe this figure may be conservative.
Now, the industry trend for ASIC chips should no longer be questioned; the focus should be on how to grasp the game rules of the "new map."
In the nearly 100 billion dollar AI ASIC market, three clear tiers have emerged: "ASIC chip designers and manufacturers that set the rules," "industry chain support," and "Fabless in vertical scenarios."
The first tier consists of ASIC chip designers and manufacturers that set the rules; they can produce ASIC chips priced over 0.01 million dollars and collaborate with downstream cloud vendors for commercial use. Representing players include Broadcom, Marvell, AIchip, and the foundry king—Taiwan Semiconductor—who benefits from any advanced chips.
The second tier is industry chain support, and the supporting logic that the market has focused on includes advanced packaging and the further downstream industry chain.
Advanced packaging: 35% of Taiwan Semiconductor's CoWoS capacity has been shifted to ASIC customers, corresponding to domestic production.$SMIC (00981.HK)$、$JCET Group Co., Ltd. (600584.SH)$、$TongFu Microelectronics (002156.SZ)$and so on.
The new hardware opportunities brought by the decoupling of hardware solutions from cloud vendors like NVIDIA: for instance, AEC Copper Cables, Amazon's self-developed single ASIC needs to be paired with three AECs. If 7 million ASICs are shipped by 2027, it corresponds to a market exceeding 5 billion USD. Other beneficiaries include Servers and PCBs, all following similar logic.
The third tier consists of the emerging vertical scene Fabless. The essence of ASIC is that it is a demand-driven market; whoever can first capture the pain points of the scene will hold the pricing power. The gene of ASIC is customization, which naturally fits with vertical scenes. Taking smart driving chips as an example, as a typical ASIC chip, with BYD and others going all in on smart driving, these products are starting to enter an explosive growth phase.
Mapping the opportunities corresponding to the three major tiers of the global ASIC Industry Chain can be seen as the three 'keys' of domestic production.
Due to the restrictions of the ban, the gap between domestic GPUs and NVIDIA remains significant, and ecosystem development is a long journey. However, in the case of ASICs, we are essentially at the same starting line as overseas counterparts. Coupled with vertical scenarios, many Fabless companies in China can produce products with better efficiency, such as the previously mentioned mining machine ASICs, autonomous driving ASICs, and AI ASICs like Alibaba's Lingang and Baidu's Kunlun chips.
The corresponding chip manufacturing mainly relies on Semiconductor Manufacturing International Corporation, while ZTE's ZTE Microelectronics is a new player entering the market. It's not ruled out that they may collaborate with domestic manufacturers in the future, playing out a scenario of 'who will be the Broadcom of China.'


The difficulty of the supporting part of the Industry Chain is relatively low, with corresponding servers, optical modules, Switches, PCBs, and Copper cables. Due to the low technical difficulty, domestic companies are already quite competitive. At the same time, these Industry Chain companies have a 'symbiotic' relationship with domestic computing power, and the ASIC chip Industry Chain will not be absent.
In terms of application scenarios, in addition to the repeatedly mentioned autonomous driving chips and AI inference acceleration cards, opportunities for other domestic design companies depend on which scenarios can erupt and which corresponding companies can seize the opportunities.
4. Conclusion
As AI leaps into the deep waters of pursuing energy efficiency after a fierce arms race in training, the second half of the computing power war is destined to belong to those companies that can transform technological fantasies into economic ledgers.
The counterattack of ASIC Chips is not just a technological revolution, but also a business revelation about efficiency, cost, and discourse power.
In this new game, the chips of Chinese players are quietly increasing—opportunities are always reserved for those who are prepared.
Editor/rice
Comment(9)
Reason For Report