Goldman Sachs noted that Google/Broadcom's TPU is rapidly narrowing the gap with NVIDIA's GPU in terms of inference cost. The per-token inference cost decreased by approximately 70% from TPU v6 to TPU v7, aligning closely with NVIDIA's GB200 NVL72. This does not imply that NVIDIA’s position is being undermined, but it clearly demonstrates that the core evaluation system for AI chip competition is shifting from “who computes faster” to “who computes more affordably and sustainably.”
Amidst persistently high AI capital expenditures and mounting commercialization pressures, market attention is undergoing a subtle yet profound shift: can large models continue to operate without regard for costs.
According to the ChaseWind trading platform, Goldman Sachs' latest AI chip research report does not follow the familiar market comparisons of 'computing power, process technology, and parameter scale,' but instead approaches from a perspective closer to commercial reality — unit cost during the inference phase. By constructing an 'inference cost curve,' Goldman Sachs attempts to answer a crucial question for the AI industry: after models enter the high-frequency invocation phase, what is the real cost per million tokens processed under different chip solutions given constraints such as depreciation, energy consumption, and system utilization.
The research conclusions point to an accelerating but not yet fully absorbed change: Google/Broadcom's TPU is rapidly narrowing the gap with NVIDIA's GPUs in terms of inference costs. Upgrading from TPU v6 to TPU v7 reduces the unit token inference cost by approximately 70%, making it roughly on par with NVIDIA's GB200 NVL72 in absolute cost terms, and even slightly advantageous in some estimation scenarios.
This does not imply $NVIDIA (NVDA.US)$ 's position is being undermined, but it clearly demonstrates that the core evaluation system for AI chip competition is shifting from “who computes faster” to “who computes more affordably and sustainably.” As training gradually becomes an upfront investment and inference emerges as a long-term source of cash flow, the slope of the cost curve is replacing peak computing power as the key variable determining the industrial landscape.
1. From computational leadership to cost efficiency, the evaluation criteria for AI chip competition are transitioning.
In the early stages of AI development, training computing power almost determined everything. Whoever could train larger models faster held the technological discourse power. However, as large models gradually move into the deployment and commercialization phases, inference workloads begin to far exceed training itself, amplifying cost issues rapidly.
Goldman Sachs points out that at this stage, the cost-performance ratio of chips is no longer solely determined by single-card performance but is shaped collectively by system-level efficiencies, including computing density, interconnect efficiency, memory bandwidth, and energy consumption, among other factors. The inference cost curve constructed based on this logic shows that Google/Broadcom's TPU has made sufficient progress in raw computational performance and system efficiency to compete head-on with NVIDIA on cost.
In contrast, $Advanced Micro Devices (AMD.US)$ and$Amazon (AMZN.US)$Trainium still shows relatively limited progress in reducing generational costs. Based on current estimates, the unit inference costs of these solutions remain significantly higher than those of NVIDIA and Google’s offerings, thereby posing only a limited challenge to the mainstream market.
2. Behind the leap in TPU costs lies system engineering capability rather than single-point breakthroughs.
The significant cost reduction achieved by TPU v7 does not stem from a single technological breakthrough but rather from the concentrated release of system-level optimization capabilities. Goldman Sachs believes that as computing chips themselves gradually approach physical limits, future reductions in inference costs will increasingly depend on advances in 'adjacent computing technologies.'
These technologies include: higher bandwidth, lower latency network interconnects; continued integration of high-bandwidth memory (HBM) and storage solutions; advanced packaging technologies (such as $Taiwan Semiconductor (TSM.US)$CoWoS); and improvements in density and energy efficiency at the rack-level solution. The synergistic optimization of TPUs in these areas has given them a distinct economic advantage in inference scenarios.
This trend is also highly consistent with Google's own computing power deployment. The proportion of TPUs used in Google's internal workloads continues to rise, and they have been widely adopted for the training and inference of the Gemini model. Meanwhile, external customers with mature software capabilities are accelerating their adoption of TPU solutions, with the most notable case being Anthropic's approximately $21 billion order placed with Broadcom. The related products are expected to begin delivery by mid-2026.
However, Goldman Sachs emphasized that NVIDIA still holds the 'time-to-market' advantage. While TPU v7 has just caught up to GB200 NVL72, NVIDIA has already advanced to GB300 NVL72 and plans to deliver VR200 NVL144 in the second half of 2026. This sustained pace of product iteration remains a key factor in maintaining customer loyalty.
III. Rebalancing Investment Implications: The Rise of ASICs, but NVIDIA’s Moat Has Not Been Breached
From an investment perspective, Goldman Sachs has not downgraded its outlook on NVIDIA despite the rapid catch-up by TPUs. The firm maintains its buy ratings on both NVIDIA and Broadcom, considering them as the most directly tied to the most sustainable parts of AI capital expenditure, and expects them to benefit long-term from advancements in networking, packaging, and system-level technologies.
Within the ASIC camp, Broadcom’s value proposition is particularly clear. Goldman Sachs has raised its fiscal 2026 earnings per share forecast for Broadcom to $10.87, which is about 6% higher than the market consensus, and believes that the market still underestimates its long-term profitability in AI networking and custom computing.
AMD and Amazon Trainium are still in the catch-up phase, but Goldman Sachs also noted the potential for AMD’s rack-level solutions to gain a late-mover advantage. By the end of 2026, the Helios rack solution based on MI455X is expected to achieve approximately a 70% reduction in inference costs in certain training and inference scenarios, warranting continued monitoring.
More importantly, this research report does not conclude with a 'winner-takes-all' scenario, but rather presents a gradually emerging picture of industrial specialization: GPUs continue to dominate the training and general computing markets, while custom ASICs increasingly penetrate scalable, predictable inference workloads. Throughout this process, NVIDIA’s CUDA ecosystem and system-level R&D investments remain a solid moat, but its valuation logic will continue to face the reality test of 'declining inference costs.'
When AI truly enters the phase where 'every token must justify its return,' the competition for computing power will ultimately return to economics itself. The 70% drop in TPU costs is not merely a simple technological catch-up but a critical stress test for the viability of AI business models. And perhaps this is the most significant signal that the market should take seriously behind the GPU versus ASIC debate.
Editor/Rice