share_log

SemiAnalysis创始人深度对话:AI新时代,英伟达会不会被挑战?

In-depth conversation with the founder of SemiAnalysis: In the new era of AI, will NVIDIA be challenged?

wallstreetcn ·  Dec 24 16:32

Excluding Google, 98% of the Global AI workloads are running on NVIDIA chips; Google and Amazon's chips currently face their own issues and pose no challenge in the short term; the shortage of data is a false proposition, as data can be synthesized for continued training; there are no issues with AI capital expenditures next year, but there is uncertainty in 2026, which could be a turning point for the Industry.

What is the actual market share of NVIDIA? What are the company's competitive advantages? Where are the opportunities for AMD, Google, and Amazon? Is data scarcity a false proposition? Are there really no issues with industry capital expenditures? Where is the turning point?

Recently, Dylan Patel, founder and chief analyst of Semi Analysis, Bill Gurley, a well-known technology investor in Silicon Valley, and Brad Gerstner held a tripartite discussion on the current state of AI chips, how long NVIDIA's competitive advantage can last, whether data scarcity is a false tomorrow, and how long AI capital expenditures can continue.

The following are the core points of the discussion:

Excluding Google, 98% of global AI workloads are run on NVIDIA chips, and if Google is considered, this data drops to 70%.

NVIDIA has three advantages: the company's software is superior to most semiconductor companies; in terms of hardware, they can adopt new technologies first and push chips from design to deployment at a very fast pace; in networking, their acquisition of MELLONOX has greatly enhanced their networking capabilities.

Although Google has its own understanding in software and computing elements, it needs to cooperate with other suppliers in difficult areas such as chip packaging design and networking.

As datacenters are being built and power supply becomes tight, companies need to plan their resources more reasonably.

Text is currently the most effective field of data, but video data contains more information. Additionally, pre-training is only part of model training, and inference time calculation is also important. If data runs out, model improvement can continue by creating synthetic data.

Although the one-time huge benefits of pre-training may have passed, companies can still gain certain benefits by increasing computing resources, especially in a competitive environment. The benefits still exist, but obtaining them has become more challenging.

Synthetic data is most effective in areas where functional validation can be performed.

Wall Street's current estimates of data center capital expenditure are often too low. By tracking global data centers, companies like Microsoft, Meta, and Amazon have significant spending on data center capacity. This indicates that they believe they can win in competition by scaling up, which is why they continue to invest.

NVIDIA is not Cisco in 2000; the valuations of the two are not comparable.

Pre-training may encounter diminishing returns or excessive costs, but synthetic data generation and inference time calculation have become new areas of development.

Currently, companies' investments in inference are relatively small. A significant improvement in model performance is expected in certain benchmark tests with functional validation in the next 6 months to 1 year.

Currently, GPT-4 is very expensive, but costs would significantly decrease if the model scale is reduced.

AMD excels in chip engineering but has significant shortcomings in Software. They lack sufficient Software Development personnel and have not invested in building GPU clusters for software development, which sharply contrasts with NVIDIA.

The TPU system built by Google in collaboration with Broadcom is competitive in terms of chip interconnection, network architecture, etc., and even surpasses NVIDIA in some aspects.

Google's TPU has relatively limited commercial success, primarily due to its software being not open enough, uncompetitive pricing, and being mainly used for internal services.

Amazon's chips are advantageous in HBM memory bandwidth and cost per dollar through cost reduction, although they are inferior to NVIDIA in technical indicators (such as memory, bandwidth, etc.), they are appealing for certain cost-sensitive application scenarios.

Overall in the market, hyperscale datacenters are expected to significantly increase spending next year, which will drive the development of the entire Semiconductors ecosystem (including networking equipment suppliers, ASIC suppliers, system suppliers, etc.).

The situation in 2026 carries some uncertainty. On one hand, whether model performance can continue to improve will be a key factor. If the rate of improvement in model performance slows, it may lead to market adjustments.

Below is the full transcript of the conversation, translated by AI.

Host: Dylan, welcome to our show. Today we will delve into a fundamental change occurring in the world of computers, a topic that has been discussed throughout this year. Bill, please introduce Dylan to everyone.

Bill: Alright, we are pleased to invite Dylan Patel from SemiAnalysis. Dylan has quickly established one of the most respected research teams in the global semiconductor industry. Today we want to delve into Dylan's insights on architecture, chip scaling trends, major global market players, supply chains, and other knowledge at the technical level, and connect that with the business issues that concern our audience. I hope to provide a phased summary of semiconductor activities related to the AI boom and attempt to grasp the overall development trends.

Dylan: I'm glad to be here. When I was a child, my Xbox broke, my parents were immigrants, and I grew up in rural Georgia with nothing much to do, so I tinkered with electronics. I opened my Xbox, shorted the temperature sensor, and then fixed it. Since then, I became intensely interested in semiconductors, started reading semiconductor companies' earnings reports, and invested, as well as delved into technology-related content.

Host: Could you briefly introduce us to SemiAnalysis?

Dylan: We are a semiconductor and AI research company that provides services to hyperscale data centers, large semiconductor private equity firms, and hedge funds.

We sell relevant data on global datacenters, including quarterly power usage, construction progress, etc.; we track about 1,500 wafer fabs globally (but only around 50 are actually critical); we also provide supply chain-related data, such as data on cables, servers, circuit boards, transformers, and other equipment, along with predictions and consulting services.

Excluding Google, over 98% of AI workloads globally run on NVIDIA chips.

Bill: Dylan, we all know that NVIDIA dominates the AI chip field, how much of the current global AI workload do you think runs on NVIDIA chips?

Dylan: If we exclude Google, the proportion exceeds 98%. But if we include Google, it is about 70%. Because a significant portion of Google's AI workloads, especially productive workloads, run on its own chips.

Bill: Are you referring to productive workloads that generate revenue, such as Google Search and other large AI-driven businesses at Google?

Dylan: Exactly. Google's non-large language model (LLM) and other productive workloads run on its internally developed chips.

In fact, Google was using Transformer technology in search workloads as early as 2018-2019, with BERT being one of the well-known and popular Transformer models at that time, running in their production search workloads for years.

The combination of three advantages has currently enabled NVIDIA to dominate the market.

Bill: Now back to NVIDIA, why does it dominate the market so much?

Dylan: NVIDIA can be likened to a three-headed dragon. Most semiconductor companies perform poorly in software, but NVIDIA is an exception.

In terms of hardware, NVIDIA also excels compared to most companies, being able to adopt new technologies more quickly and move chips from design to deployment at a rapid pace. Additionally, they acquired Mellanox, greatly enhancing their networking capabilities. The combination of these three advantages makes it difficult for other semiconductor companies to compete with them individually.

Bill: You previously wrote an article that helped everyone understand the complexities of NVIDIA's modern cutting-edge deployments, including aspects like racks, memory, networking, and scale. Could you briefly introduce these again?

Dylan: Alright. When we look at GPUs, running an AI workload often requires multiple chips to work together, as the scale of the model has far exceeded the capacity of a single chip.

NVIDIA's NVLink architecture effectively connects multiple chips, but interestingly, Google and Broadcom had collaborated to build similar system architectures even before NVIDIA, such as Google creating a similar system with TPU back in 2018.

Although Google has its own understanding in software and computing elements, it needs to cooperate with other suppliers in difficult areas such as chip packaging design and networking.

Now, NVIDIA has launched the Blackwell system, a rack containing multiple GPUs, weighing three tons, with thousands of cables, making it very complex.

Competitors such as AMD have also recently entered the system design field through acquisitions because building a multi-chip system that works collaboratively, cools well, and has reliable networking is a highly challenging problem, as semiconductor companies usually lack relevant engineers.

Bill: So where do you think NVIDIA has made incremental differentiated investments?

Dylan: NVIDIA has primarily made significant investments in the supply chain. They must work closely with the supply chain to develop next-generation technologies and bring them to market first.

For example, in areas such as networking, optics, water cooling, and power transmission, NVIDIA continuously introduces new technologies to maintain its competitive edge. Their pace is very fast, with many changes each year, like the launch of products such as Blackwell and Rubin. If they stagnate, they will face competitive pressure as other competitors are also striving to catch up.

Bill: If NVIDIA stagnates, in what areas might they face competition? What conditions must other alternatives meet in order to capture a larger share of workloads in the market?

Dylan: For NVIDIA, their major customers are spending heavily on AI, and they have enough resources to research how to run models on other Hardware, especially in terms of inference.

Although NVIDIA's advantage in inference Software is relatively small, their Hardware performance is currently the best, which means lower capital costs, operational costs, and higher performance. If NVIDIA stops making progress, their performance advantage will no longer grow, giving other competitors an opportunity.

For example, now with the launch of Blackwell, NVIDIA is not only 10 - 15 times faster in inference performance than previous products (optimized for large models), but has also lowered the profit margin to respond to competition. They plan to improve performance over five times a year, which is a very fast pace. At the same time, AI models themselves are also being continuously improved, and costs are decreasing, which will further stimulate demand.

Bill: You mentioned that the role of Software in training and inference is different. Can you explain that in more detail?

Dylan: Many people simply refer to NVIDIA's Software as Kuta, but in fact, it contains many levels.

In terms of training, users usually rely on the performance of NVIDIA's Software because researchers continuously try new methods and do not have much time to optimize performance.

In terms of inference, companies like Microsoft deploy on a limited number of models and update the models roughly every six months. They can invest a large number of engineers to optimize the running performance of these models on Other Hardware. For example, Microsoft has already deployed GPT-style models on the Hardware of companies such as AMD.

Host: We previously mentioned a chart showing that there will be one trillion dollars of new AI workloads and one trillion dollars of datacenter replacement workloads over the next four years. What is your view on this? Some believe that people will not use NVIDIA's GPUs to rebuild CPU datacenters. How do you respond to this viewpoint?

Dylan: NVIDIA has long been promoting the use of accelerators for non-AI workloads, such as in the professional visualization field (like Pixar making movies) and Siemens engineering applications, which both utilize GPUs.

Although these are only a small part of the AI field, applications do exist. Regarding datacenter replacements, although AI is developing rapidly, traditional workloads (like network services and databases) will not stop or slow down as a result. The supply chain for datacenters is long, and the construction cycle is also lengthy, which is a realistic problem.

For example, Intel's CPUs have made slow progress over the past few years, and AMD's emergence has provided higher performance options. Many old Intel CPU servers in Amazon datacenters have been in use for years and can now be replaced with newer, higher-performance servers (like 128-core or 192-core), which can not only improve performance but also reduce the number of servers under the same power consumption, thus freeing up space for AI servers.

So, while there are cases of datacenter replacements, the overall market is still growing; it is just that the development of AI has driven this behavior, as businesses need more computing power to support AI applications.

Host: This reminds me of what Sasha mentioned in the program last week, where he said they are limited by datacenters and electrical utilities, not by chip supply. Do you think this relates to your previous explanation?

Dylan: I think Sasha's viewpoint emphasizes the bottleneck position of datacenters and electrical utilities, which is different from the chip supply situation. With the construction of datacenters and the tight supply of electricity, businesses need to plan resources more rationally, which also explains why they take some measures, such as acquiring electrical resources from cryptocurrency mining companies or extending the depreciation cycle of old servers.

If there are no data, synthetic data can be created to improve models.

Host: Before discussing alternatives to NVIDIA, let's first talk about the pre-training and scaling debate mentioned in your article. Iliat stated that data is the "fossil fuel" of AI, and we have consumed most of it, suggesting that the significant gains from pre-training will not be repeated. What do you think of this perspective?

Dylan: The pre-training scaling law is relatively simple; increasing computational resources can enhance model performance, but this involves two dimensions: data and parameters.

When data runs out, although the model can continue to be scaled up, the returns may decrease. However, our current utilization of video data is still very limited, which is a misconception. In fact, text is the most effective data domain at present, but video data contains much more information. Additionally, pre-training is just one part of model training; inference time calculations are also significant. If data is exhausted, we can continue to improve the model by creating synthetic data, such as the methods that companies like OpenAI are trying, where the model generates a large amount of data and then verifies its functionality, filtering out effective data for training to enhance model performance. Although this method is still in its early stages, with relatively low funding, it provides a new direction for model improvement.

Host: From an investment perspective, NVIDIA is under the spotlight. But if the returns from pre-training have mostly been obtained, why are people still building larger clusters?

Dylan: Although the one-time significant gains from pre-training may be over, we can still achieve certain returns by increasing computational resources, especially in a competitive environment where companies seek to enhance model performance to maintain competitiveness.

Moreover, the comparison between models and competitors' models also drives companies to continue investing. Although, from an ROI perspective, continuing to scale may be exponentially expensive, it can still be a rational decision, as returns still exist, albeit with increased difficulty in obtaining them. Furthermore, with the emergence of new methods like synthetic data generation, the speed of model improvement may accelerate, providing motivation for continued investment from companies.

Host: In which areas is synthetic data most effective? Can you provide examples?

Dylan: Synthetic data is most effective in areas where functionality can be validated, such as in Google's services, which have extensive unit testing to ensure the system operates correctly; these unit tests can be used to evaluate whether the outputs generated by LLM are accurate.

In fields such as mathematics and engineering, outputs can be evaluated against clear standards, whereas in some subjective fields like art, writing style, and negotiation skills, it is challenging to conduct functional validation because the criteria for judgment are more subjective. For example, in the field of image generation, it is difficult to determine which image is more beautiful as it depends on personal preferences; however, in mathematical calculations or engineering designs, it is possible to clearly determine if the output is correct.

Wall Street has underestimated the capital expenditure of large datacenters.

Host: What have you heard from the hyperscale datacenters? They all indicate that capital expenditure (capex) will increase next year and that they are building larger clusters, is this true?

Dylan: Based on our tracking and analysis, Wall Street's estimates of capex are generally too low. We track every datacenter globally and find that companies like Microsoft, Meta, and Amazon are investing significantly in datacenter capacity.

They have signed datacenter lease agreements for next year, expecting cloud revenue to accelerate as they are currently limited by datacenter capacity. This indicates that they believe winning in competition is possible through scaling, which is why they continue to invest.

Host: You previously mentioned the large-scale cluster construction for pre-training. If there is a change in the pre-training trend, what would be the changes in their construction for inference?

Dylan: In training neural networks, forward propagation is used to generate data, and backward propagation is used to update weights. In the new paradigms of synthetic data generation, output evaluation, and model training, the computational load for forward propagation significantly increases because a large number of possibilities must be generated, whereas the computational load for backward propagation is relatively small since training occurs only on a few effective data points. This means that during training, there is a large amount of inference computation, and in fact, the amount of inference computation during training exceeds that of updating model weights.

Additionally, whether all components need to be in the same location during model training depends on the specific circumstances.

For example, Microsoft is building multiple Datacenters in different regions because they find that reasoning workloads can be allocated to different Datacenters, while models are updated elsewhere, allowing for more efficient resource utilization. Therefore, the pre-training paradigm has not slowed down; rather, the cost of improvements in each generation increases logarithmically, but businesses are looking for other ways to reduce costs and increase efficiency.

NVIDIA is not Cisco of the year 2000.

Host: Some people are comparing NVIDIA to Cisco's situation in 2000, what do you think?

Dylan: This comparison is somewhat unfair. A large part of Cisco's revenue came from private/Crediting investments in telecommunications infrastructure, while NVIDIA's revenue sources are different, with a smaller proportion of private/Crediting investments, such as CoreWeave supported by Microsoft.

Additionally, during the Internet bubble, the scale of private capital entering the field was much larger than it is now. Although the venture capital market seems active now, the private market (such as Middle Eastern sovereign wealth funds) has not yet seen large amounts of funds enter. Moreover, compared to Cisco's time, the sources of capital, positive cash flow, and rationality of investment for these profitable companies have all changed. NVIDIA currently has a PE of 30, which is still significantly lower compared to Cisco’s 120, so a simple comparison cannot be made.

Inference time reasoning is a new direction for expanding intelligence.

Host: You mentioned that inference time reasoning is a new direction for expanding intelligence and that its computational intensity is greater than pre-training, can you explain this in detail?

Dylan: Pre-training may encounter diminishing returns or excessive costs, but synthetic data generation and inference time computation have become new development directions.

Calculating inference time sounds good because it doesn't require spending more on training models, but there are significant trade-offs. For example, with GPT-4o, it generates a large amount of data during inference, but ultimately only a portion of that is output to users, consuming a lot of computational resources in the process.

For example, when handling user requests, the model may generate thousands of intermediate results (tokens), but only a few hundred are ultimately output to users. This means that computational costs increase significantly, not only due to the increase in the number of generated tokens but also because handling these tokens requires more memory to store contextual information (like KV cache), which reduces the number of user requests the server can handle simultaneously, thereby increasing the cost per user.

From a cost perspective, for a company like Microsoft, if its inference revenue is $10 billion, with a gross margin of 50-70%, and costs in the billions, when using models like GPT-4o, the costs may significantly increase due to the rising inference computing costs. Even though the model performs better and can charge higher fees, the increase in costs might exceed the increase in revenue.

The enterprise-level demand for the GPT-4o model has been underestimated.

Host: So, has the market's enterprise-level demand for models like GPT-4o been overestimated or underestimated?

Dylan: GPT-4o is still in its early stages, and people's understanding and application of it are not yet deep.

However, from some anonymous benchmarking tests currently available, many companies (like Google and Anthropic) are developing inference models, and they see a clear path to improving model performance by increasing computational resources. These companies have relatively low investment in inference and are still in the initial stages, but they have significant room for improvement. It is expected that within the next 6 months to 1 year, model performance will greatly improve in some functional validation benchmark tests. Therefore, the potential demand for such models in the market is enormous, but it is currently difficult to assess accurately.

Host: Looking back at the Internet wave, many startups initially relied on Oracle and Sun Microsystems technologies, but the situation changed five years later. Will this happen in the AI Chip field?

Dylan: Currently, GPT-4o is very expensive, but if the model size is reduced, the cost will decrease significantly.

For example, the cost can be greatly reduced from GPT-4o to Llama 7b. For smaller models, inference is relatively easy and can run on a single chip, leading to intense market competition, with many companies offering API inference services based on models like Llama, resulting in fierce price competition and lower profit margins.

In contrast, companies like Microsoft that use OpenAI models have higher gross margins (50-70%) because they possess high-performance models and have enterprises or consumers willing to pay a premium for them.

However, as more companies enter the market, model differentiation becomes increasingly important; only those with the best models and the ability to find enterprises or consumers willing to pay for them can stand out in the competition. Therefore, the market is quickly filtering, and ultimately, only a few companies may be able to compete in this field.

Google and Amazon chips each have their strengths and weaknesses.

Host: So what is the situation with AMD among these competing companies?

Dylan: AMD performs excellently in chip engineering but has significant shortcomings in software. They lack sufficient software developers and have not invested in building GPU clusters to develop software, which sharply contrasts with NVIDIA.

Additionally, AMD has been focused on competing with Intel and lacks system-level design experience. Although they acquired ZT Systems, they still lag behind NVIDIA in system architecture design for large-scale datacenters.

Large-scale Datacenter customers (such as Meta and Microsoft) are helping AMD improve Software and understand model development, inference economics, etc., but AMD still cannot compete with NVIDIA on the same timeline. It is expected that AMD's share of AI revenue among customers like Microsoft and Meta will decline next year, but it will still profit from the market, although it won't achieve the tremendous success like NVIDIA.

Host: What about Google's TPU? It seems to be the second choice after NVIDIA.

Dylan: Google’s TPU has its unique features in systems and infrastructure. The performance of a single TPU is good, but more importantly, its system design is key. The TPU system built in collaboration with Broadcom is competitive in chip interconnect, network architecture, etc., and even surpasses NVIDIA in certain aspects.

In addition, Google has adopted water cooling technology over the years, enhancing the reliability of the system, while NVIDIA only recently realized the need for water cooling technology.

However, the commercial success of Google’s TPU is relatively limited, mainly due to its Software not being open enough; many internally used Software (such as the ones used by DeepMind) are not offered to Google Cloud users.

In terms of pricing, although the official price is high, the actual negotiated price still lacks competitiveness; compared to other Cloud Computing Service providers (such as Oracle, Microsoft, Amazon, etc.), Google's TPU pricing does not have an advantage.

Moreover, Google utilizes a large number of TPUs for internal services (such as search, Gemini applications, etc.), with a small external rental market share, mainly renting to Apple, and Apple’s rental of TPUs may be related to its attitude towards NVIDIA (there may be competitive relations, but the specific reasons have not been mentioned yet).

Host: What about Amazon? Can you provide a detailed introduction to Amazon's chips just like the one for Google's TPU?

Dylan: Amazon's chip can be referred to as the 'Amazon Basic TPU'. It has cost-effective advantages in some aspects, such as using more silicon and memory, with network capabilities somewhat comparable to TPU. However, it falls short in efficiency, such as using more active cables (Google TPU, in cooperation with Broadcom, uses passive cables) and lower silicon area utilization efficiency.

However, Amazon has advantages in HBM memory bandwidth and cost per dollar due to lower costs, with its chip prices significantly lower than NVIDIA's. Although it is inferior to NVIDIA in technical indicators (such as memory, bandwidth, etc.), it is attractive for some cost-sensitive application scenarios.

Amazon has partnered with Anthropic to establish a supercomputer system containing 0.4 million chips. They believe that large-scale chip deployment is useful for inference and model improvement, and although it may not be the most advanced technically, its cost-effectiveness makes it a reasonable choice for Amazon.

Next year's capital expenditures are clear, but there is uncertainty after 2026.

Host: Looking ahead to 2025 - 2026, what are your views on the semiconductor market? For example, Broadcom's recent stock price increase, NVIDIA's stock price fluctuations, how do you think the market will develop?

Dylan: Broadcom has achieved some success in the custom ASIC field, winning several custom ASIC orders, including orders from companies like Google. Google is working to enhance the performance of its custom chips, especially in recommendation systems. Additionally, companies like OpenAI are also developing their own chips, and Apple has some chips produced in cooperation with Broadcom. These trends indicate that market competition will intensify.

Overall in the market, hyperscale datacenters are expected to significantly increase spending next year, which will drive the development of the entire Semiconductors ecosystem (including networking equipment suppliers, ASIC suppliers, system suppliers, etc.).

However, there is some uncertainty regarding the situation in 2026.

On one hand, whether the model performance can continue to improve will be a key factor. If the speed of model performance improvement slows down, it may lead to a market adjustment, as the current growth of the market largely depends on the continuous advancement in model performance and the resulting growth in demand for computing resources.

On the other hand, capital investment is also an important variable. Currently, sovereign wealth funds from the Middle East, as well as pension funds from Singapore, Nordic countries, and Canada, have not yet entered the market in large scale, but if they decide to invest substantial funds in the future, it will have a significant impact on the market.

In addition, the new Cloud Computing Service market will face consolidation. Among approximately 80 new cloud service providers we are tracking, only a few (5 - 10) may survive in the competition. Five of them are sovereign cloud service providers, and around five are competitive enterprises.

Currently, the GPU leasing market prices are changing rapidly, for instance, the leasing price of NVIDIA H100 has dropped significantly, and the competition among new cloud service providers is fierce. The on-demand GPU pricing from large cloud service providers such as Amazon is also declining rapidly. The proportion of enterprises purchasing GPU clusters remains relatively low; they are more inclined to outsource GPU computing needs to new cloud service providers. However, this situation may change with market consolidation.

For NVIDIA, although it faces competition, there is still an opportunity to dominate the market if it can continue to maintain CSI Leading Technology Index and launch products with better performance at lower costs. For example, although the cost of their upcoming products is higher than that of previous generations, it is still possible to achieve growth through performance optimization and pricing strategy adjustments. However, if market demand does not grow as expected or more competitive alternatives emerge, NVIDIA's revenue may be affected.

Host: Thank you very much, Dylan, for today's sharing; it has given us a deeper understanding of the development of the Semiconductor Industry in the AI field. We hope to continue to focus on the dynamics of this area in the future and look forward to seeing how companies perform in this market full of opportunities and challenges. Thank you once again!

Dylan: Thank you, I am glad to share my views here.

Host: Just a reminder, the above content represents our views only and does not constitute investment advice.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment