In-depth conversation with the founder of SemiAnalysis: In the new era of AI, will NVIDIA be challenged?

wallstreetcn · Dec 24 16:32

不考虑谷歌，全球AI工作量98%是在英伟达芯片上运行的；谷歌、亚马逊芯片目前各有各的问题，短期构不成挑战；数据短缺是伪命题，没有数据可以合成数据继续训练；明年的AI资本开支没问题，2026年存在不确定性，可能是行业转折点。

英伟达的市场份额到底有多大？公司的竞争优势到底在哪？AMD、谷歌、亚马逊的机会在哪？数据短缺是伪命题吗？行业资本开支真的没问题吗？转折点在哪？

近日，Semi Analysis创始人兼首席分析师迪伦·帕特尔（Dylan Patel）、硅谷著名科技投资人比尔.柯尔利（Bill Gurley）、布拉德·格斯特纳 (Brad Gerstner)展开三方对谈，就AI芯片目前的现状，英伟达的竞争优势还能持续多久，数据短缺是否是伪明天，以及AI资本开支还能持续多久进行了深入的讨论。

以下是对谈核心要点：

不考虑谷歌，全球AI工作量中98%是在英伟达芯片上运行的，如果考虑谷歌这个数据是70%。
英伟达的优势是三方面：公司软件优于大部分半导体公司；硬件方面，他们能够率先采用新技术，并以极快的速度将芯片从设计推向部署；网络方面，他们收购MELLONOX，极大地提升了网络能力。
谷歌虽然在软件和计算元素方面有自己的理解，但在芯片封装设计和网络等困难领域需要与其他供应商合作。
随着数据中心的建设和电力供应的紧张，企业需要更加合理地规划资源。
文本是目前最有效的数据领域，但视频数据蕴含的信息更多。此外，预训练只是模型训练的一部分，推理时间计算也很重要。如果数据耗尽，可以通过创造合成数据来继续改进模型。
虽然预训练的一次性巨大收益可能已经过去，但通过增加计算资源，企业仍然可以获得一定的收益，尤其是在竞争激烈的环境下。收益仍然存在，只是获取难度增加了。
合成数据在能够进行功能验证的领域最有效。
华尔街目前对数据中心资本开支的估计通常过低。通过跟踪全球数据中心，微软、Meta、亚马逊等公司在数据中心容量上的支出非常大。这表明他们相信通过扩大规模可以在竞争中获胜，所以才会不断投入。
英伟达不是2000年的思科，双方估值没有可比性。
预训练可能会遇到收益递减或成本过高的问题，但合成数据生成和推理时间计算成为新的发展方向。
目前公司对于推理的投入相对较小。预计在未来6个月到 1 年，在某些具有功能验证的基准测试中，模型性能将有巨大提升。
目前 GPT - 4o非常昂贵，但如果降低模型规模，成本会大幅下降。
AMD 在芯片工程方面表现出色，但在软件方面存在明显不足。他们缺乏足够的软件开发人员，也没有投入资金建设 GPU 集群来开发软件，这与英伟达形成鲜明对比。
谷歌与博通合作构建的TPU系统，在芯片互连、网络架构等方面具有竞争力，甚至在某些方面优于英伟达。
谷歌的TPU在商业上的成功相对有限，主要原因包括其软件不够开放，定价没有竞争力，主要用于内部服务等。
亚马逊芯片通过降低成本，在 HBM 内存带宽和每美元成本方面具有优势，虽然在技术指标上（如内存、带宽等）低于英伟达，但对于一些对成本敏感的应用场景具有吸引力。
从市场整体来看，超大规模数据中心计划在明年大幅增加支出，这将带动整个半导体生态系统（包括网络设备供应商、ASIC 供应商、系统供应商等）的发展。
2026 年的情况存在一定的不确定性。一方面，模型性能是否能够持续提升将是关键因素。如果模型性能提升速度放缓，可能会导致市场出现调整。

以下为对谈全文，由AI进行翻译

主持人：迪伦，欢迎来到我们的节目。今天我们要深入探讨一个今年一直在讨论的话题，那就是计算机世界正在发生的根本性变化。比尔，你先来给大家介绍一下迪伦吧。

比尔：好的，我们很高兴邀请到 SemiAnalysis 的迪伦・帕特尔。迪伦迅速建立起了全球半导体行业最受尊敬的研究团队之一。今天我们想深入探讨迪伦在技术层面所了解的架构、芯片缩放趋势、全球市场的主要参与者、供应链等知识，并将其与我们听众关心的商业问题联系起来。我希望能对与人工智能热潮相关的半导体活动进行一个阶段性的总结，并尝试从整体上把握其发展趋势。

迪伦：很高兴来到这里。我小时候，我的 Xbox 坏了，我父母是移民，我在佐治亚州的农村长大，没什么事可做，就只能捣鼓电子产品。我打开 Xbox，短路了温度传感器，然后修好了它。从那时起，我就对半导体产生了浓厚的兴趣，开始阅读半导体公司的财报并投资，还深入研究技术相关的内容。

主持人：能给我们简单介绍一下 SemiAnalysis 吗？

迪伦：我们是一家半导体和人工智能研究公司，为超大规模数据中心、大型半导体私募股权公司和对冲基金等提供服务。

我们销售全球数据中心的相关数据，包括每个季度的功率、建设进展等；跟踪全球约 1500 家晶圆厂（但实际关键的约 50 家）；还提供供应链相关数据，如电缆、服务器、电路板、变压器等设备的数据，并进行预测和咨询服务。

不考虑谷歌，全球98%的AI工作都使用英伟达芯片

比尔：迪伦，我们都知道英伟达在 AI 芯片领域占据主导地位，你认为目前全球 AI 工作量中有多少是在英伟达芯片上运行的呢？

迪伦：如果不考虑谷歌，占比超过 98%。但如果把谷歌算进去，大约是 70%。因为谷歌有很大一部分 AI 工作量，尤其是生产性工作量，是在自己的芯片上运行的。

比尔：你说的生产性工作量是指那些能产生收益的业务，比如谷歌搜索和谷歌的其他大型 AI 驱动业务吗？

迪伦：没错。谷歌的非大语言模型（LLM）和其他生产性工作负载运行在其内部自研芯片上。

实际上，谷歌早在 2018 - 2019 年就在搜索工作负载中使用了 Transformer 技术，比如 BERT 就是当时非常知名且流行的 Transformer 模型之一，多年来一直在其生产搜索工作负载中运行。

三项优势结合让英伟达目前主导市场

比尔：那回到英伟达，为什么它如此主导市场呢？

迪伦：可以把英伟达比作三头龙。全球大多数半导体公司在软件方面表现不佳，但英伟达除外。

在硬件方面，英伟达也比大多数公司更出色，他们能够率先采用新技术，并以极快的速度将芯片从设计推向部署。此外，在网络方面，他们收购了 MELLONOX，极大地提升了网络能力。这三个方面的优势相结合，使得其他半导体公司难以单独与之竞争。

比尔：你之前写过一篇文章，帮助大家理解了英伟达这些现代尖端部署的复杂性，包括机架、内存、网络和规模等方面，能再给我们简单介绍一下吗？

迪伦：好的。当我们看 GPU 时，运行一个 AI 工作负载通常需要多个芯片协同工作，因为模型的规模已经远超单个芯片的能力。

英伟达的 NVLink 架构能够很好地将多个芯片联网，但有趣的是，谷歌和博通早在英伟达之前就合作构建了类似的系统架构，比如谷歌在 2018 年就用 TPU 构建了类似系统。

谷歌虽然在软件和计算元素方面有自己的理解，但在芯片封装设计和网络等困难领域需要与其他供应商合作。

现在，英伟达推出了 Blackwell 系统，这是一个包含多个 GPU 的机架，重达三吨，有数千根电缆，非常复杂。

而竞争对手如 AMD 等，最近也通过收购来进入系统设计领域，因为构建一个能够协同工作、冷却良好、网络可靠的多芯片系统是一个极具挑战性的问题，半导体公司通常缺乏相关工程师。

比尔：那你认为英伟达在哪些方面进行了增量差异化投资呢？

迪伦：英伟达主要在供应链方面进行了大量投资。他们必须与供应链紧密合作，以开发下一代技术并率先推向市场。

例如，在网络、光学、水冷和电力传输等领域，英伟达不断推出新技术，以保持其竞争优势。他们的节奏非常快，每年都有很多变化，像 Blackwell、Rubin 等产品的推出。如果他们停滞不前，就会面临竞争压力，因为其他竞争对手也在努力追赶。

比尔：如果英伟达停滞不前，他们在哪些方面可能会面临竞争？市场上其他替代品需要具备哪些条件才能占据更多的工作负载份额呢？

迪伦：对于英伟达来说，其主要客户在 AI 方面的支出巨大，他们有足够的资源来研究如何在其他硬件上运行模型，尤其是在推理方面。

虽然英伟达在推理软件方面的优势相对较小，但他们的硬件性能目前是最好的，这意味着更低的资本成本、运营成本和更高的性能。如果英伟达停止进步，其性能优势将不再增长，其他竞争对手就有机会。

例如，现在随着 Blackwell 的推出，英伟达不仅在推理性能上比以前的产品快 10 - 15 倍（针对大型模型进行了优化），还降低了利润率以应对竞争，他们计划每年将性能提升 5 倍以上，这是一个非常快的速度。同时，AI 模型本身也在不断改进，成本也在下降，这将进一步刺激需求。

比尔：你提到软件在训练和推理中的作用不同，能详细解释一下吗？

迪伦：很多人把英伟达的软件简单地称为 Kuta，但实际上它包含很多层次。

在训练方面，用户通常依赖英伟达的软件性能，因为研究人员不断尝试新的方法，没有太多时间去优化性能。

而在推理方面，像微软这样的公司，会在有限的几个模型上进行部署，并且每六个月左右更新一次模型，他们可以投入大量工程师来优化这些模型在其他硬件上的运行性能。例如，微软已经在 AMD 等公司的硬件上部署了 GPT 风格的模型。

主持人：我们之前提到过一张图表，显示未来四年将有一万亿美元的新 AI 工作量，以及一万亿美元的数据中心替换工作量，你对此怎么看？有人认为人们不会用英伟达的 GPU 来重建 CPU 数据中心，你怎么回应这种观点？

迪伦：英伟达长期以来一直在推动非 AI 工作负载使用加速器，比如专业可视化领域（如 Pixar 制作电影）、西门子工程应用等都使用了 GPU。

虽然这些在 AI 领域相比只是一小部分，但确实存在应用。关于数据中心替换，虽然 AI 发展迅速，但传统工作负载（如网络服务、数据库）并不会因此停止或放缓。数据中心的供应链较长，建设周期也长，这是一个现实问题。

例如，英特尔的 CPU 在过去几年进展缓慢，而 AMD 的出现提供了更高性能的选择，许多亚马逊数据中心的旧英特尔 CPU 服务器已经使用多年，现在可以用性能更高的新服务器（如 128 核或 192 核）来替换，这样不仅能提升性能，还能在相同功耗下减少服务器数量，从而为 AI 服务器腾出空间。

所以，虽然有数据中心替换的情况，但市场整体仍在增长，只是 AI 的发展促使了这种行为，因为企业需要更多的计算能力来支持 AI 应用。

主持人：这让我想起上周萨沙在节目中提到的，他说他们受到数据中心和电力的限制，而不是芯片的限制，你觉得这与你刚刚的解释有什么关联吗？

迪伦：我认为萨沙的观点强调了数据中心和电力在当前的瓶颈地位，这与芯片供应情况不同。随着数据中心的建设和电力供应的紧张，企业需要更加合理地规划资源，这也解释了为什么他们会采取一些措施，如从加密货币挖矿公司获取电力资源，或者延长旧服务器的折旧周期等。

如果没有数据，可以创造合成数据改进模型

主持人：在讨论替代英伟达的方案之前，我们先谈谈你在文章中提到的预训练和缩放辩论吧。伊利亚特说数据是 AI 的 “化石燃料”，我们已经消耗了大部分，预训练的巨大收益不会再重复，你怎么看这个观点？

迪伦：预训练缩放定律相对简单，增加计算资源可以提升模型性能，但这涉及到数据和参数两个维度。

当数据耗尽时，虽然可以继续扩大模型规模，但收益可能会减少。不过，目前我们对视频数据的利用还非常有限，这是一个误解。实际上，文本是目前最有效的数据领域，但视频数据蕴含的信息更多。此外，预训练只是模型训练的一部分，推理时间计算也很重要。如果数据耗尽，我们可以通过创造合成数据来继续改进模型，例如 OpenAI 等公司正在尝试的方法，通过让模型生成大量数据，然后进行功能验证，筛选出有效的数据用于训练，从而提高模型的性能。虽然这种方法目前还处于早期阶段，投入的资金相对较少，但它为模型改进提供了新的方向。

主持人：从投资的角度来看，英伟达备受关注。但如果预训练的收益已经大部分被获取，为什么大家还在建造更大的集群呢？

迪伦：虽然预训练的一次性巨大收益可能已经过去，但通过增加计算资源，我们仍然可以获得一定的收益，尤其是在竞争激烈的环境下，企业希望通过提升模型性能来保持竞争力。

此外，模型与竞争对手模型之间的对比也促使企业不断投入。虽然从投资回报率来看，继续扩大规模可能是对数级别的昂贵，但仍然可能是一个理性的决策，因为收益仍然存在，只是获取难度增加了。而且，随着合成数据生成等新方法的出现，模型改进的速度可能会加快，这也为企业继续投资提供了动力。

主持人：那在哪些领域合成数据最有效呢？能举例说明吗？

迪伦：合成数据在能够进行功能验证的领域最有效，比如在谷歌的服务中，他们有大量的单元测试来确保系统正常运行，这些单元测试可以用来评估 LLM 生成的输出是否正确。

在数学、工程等领域，输出可以通过明确的标准进行评估，而在一些主观领域，如艺术、写作风格、谈判技巧等，很难进行功能验证，因为这些领域的评判标准比较主观。例如，在图像生成领域，很难说哪张图像更美，因为这取决于个人喜好；而在数学计算或工程设计中，可以明确判断输出是否正确。

华尔街低估了大型数据中心的资本支出

主持人：你从超大规模数据中心那里听到了什么？他们都说明年资本支出（capex）会增加，正在建造更大的集群，这是真的吗？

迪伦：根据我们的跟踪和分析，华尔街对 capex 的估计通常过低。我们跟踪全球每个数据中心，发现微软、Meta、亚马逊等公司在数据中心容量上的支出非常大。

他们签署了明年的数据中心租赁协议，预计云收入将加速增长，因为他们目前受到数据中心容量的限制。这表明他们相信通过扩大规模可以在竞争中获胜，所以才会不断投入。

主持人：你之前提到的关于预训练的大规模集群建设，如果预训练趋势发生变化，他们在推理方面的建设会有什么变化吗？

迪伦：在训练神经网络时，正向传播用于生成数据，反向传播用于更新权重，而在合成数据生成、评估输出和训练模型的新范式中，正向传播的计算量大幅增加，因为需要生成大量可能性，而反向传播的计算量相对较少，因为只在少数有效数据上进行训练。这意味着在训练过程中有大量的推理计算，实际上训练中的推理计算量比更新模型权重的计算量还要大。

此外，在训练模型时，是否需要所有组件都在同一位置取决于具体情况。

例如，微软在不同地区建设多个数据中心，因为他们发现可以将推理工作负载分配到不同数据中心，同时在其他地方更新模型，这样可以更有效地利用资源。因此，预训练的范式并没有放缓，只是每一代的改进成本呈对数增加，但企业正在寻找其他方法来降低成本，提高效率。

英伟达不是2000年的思科

主持人：有人将英伟达与思科在 2000 年的情况进行比较，你怎么看？

迪伦：这种比较存在一些不公平之处。思科的收入很大一部分来自私人 / 信贷投资于电信基础设施建设，而英伟达的收入来源与此不同，其私人 / 信贷投资占比较小，如 CoreWeave 由微软支持。

此外，在互联网泡沫时期，进入该领域的私人资本规模远大于现在，虽然现在风险投资市场看似活跃，但实际上私人市场（如中东主权财富基金）的资金尚未大量进入。而且，与思科当时相比，现在这些盈利公司的资本来源、正现金流以及投资的理性程度都有所不同。英伟达目前的市盈率为 30，与思科当时的 120 相比还有很大差距，因此不能简单地进行类比。

推理时间推理（inference time reasoning）是扩展智能的新方向

主持人：你提到推理时间推理是扩展智能的新方向，并且计算密集度比预训练更高，能详细解释一下吗？

迪伦：预训练可能会遇到收益递减或成本过高的问题，但合成数据生成和推理时间计算成为新的发展方向。

推理时间计算听起来不错，因为不需要在训练模型上花费更多成本，但实际上存在很大的权衡。以 GPT - 4o 为例，它在推理时会生成大量数据，但最终输出给用户的只是其中一部分，在这个过程中，模型需要消耗大量计算资源。

例如，在处理用户请求时，模型可能会生成数千个中间结果（令牌），但最终只输出几百个给用户。这意味着计算成本大幅增加，不仅因为生成的令牌数量增加，还因为在处理这些令牌时，需要更多的内存来存储上下文信息（如 KV 缓存），这导致服务器能够同时处理的用户请求数量减少，从而增加了每个用户的成本。

从成本角度看，对于微软这样的公司，如果其推理收入为 100 亿美元，毛利率为 50 - 70%，成本为几十亿美元，当使用像 GPT - 4o 这样的模型时，由于推理计算成本增加，其成本可能会显著上升，尽管模型性能更好，可以收取更高费用，但成本的增加幅度可能超过收入的增加幅度。

GPT - 4o模型的企业级需求被低估了

主持人：那市场对 GPT - 4o 这样的模型的企业级需求是被高估还是低估了呢？

迪伦：GPT - 4o 目前还处于早期阶段，人们对它的理解和应用还不够深入。

但从目前一些匿名基准测试来看，有很多公司（如谷歌、Anthropic 等）正在开发推理模型，并且他们看到了通过增加计算资源来提升模型性能的明确路径。这些公司在推理方面的投入相对较少，目前还处于起步阶段，但他们有很大的提升空间，预计在未来 6 个月到 1 年，在某些具有功能验证的基准测试中，模型性能将有巨大提升。因此，市场对这类模型的需求潜力巨大，但目前还难以准确评估。

主持人：回顾互联网浪潮，当时很多创业公司最初依赖甲骨文和太阳公司的技术，但五年后情况发生了变化。在 AI 芯片领域，这种情况会发生吗？

迪伦：目前 GPT - 4o 非常昂贵，但如果降低模型规模，成本会大幅下降。

例如，从 GPT - 4o 到 Llama 7b，成本可以降低很多。对于小型模型，推理相对容易，可以在单个芯片上运行，这导致市场竞争激烈，许多公司提供基于 Llama 等模型的 API 推理服务，价格竞争激烈，利润率较低。

相比之下，像微软这样使用 OpenAI 模型的公司，毛利率较高（50 - 70%），因为他们拥有高性能模型，并且有企业或消费者愿意为其支付高额费用。

但随着更多公司进入市场，模型的差异化变得更加重要，只有拥有最好的模型，并且能够找到愿意为其付费的企业或消费者，才能在竞争中脱颖而出。因此，市场正在快速筛选，最终可能只有少数几家公司能够在这个领域竞争。

谷歌、亚马逊芯片各自有优劣

主持人：那在这些竞争公司中，AMD 的情况如何呢？

迪伦：AMD 在芯片工程方面表现出色，但在软件方面存在明显不足。他们缺乏足够的软件开发人员，也没有投入资金建设 GPU 集群来开发软件，这与英伟达形成鲜明对比。

此外，AMD 一直专注于与英特尔竞争，缺乏系统级设计经验，虽然收购了 ZT 系统公司，但在大规模数据中心的系统架构设计方面仍落后于英伟达。

超大规模数据中心客户（如 Meta 和微软）在帮助 AMD 改进软件和理解模型开发、推理经济等方面，但 AMD 仍无法与英伟达在同一时间表上竞争。预计 AMD 明年在微软和 Meta 等客户中的 AI 收入份额将下降，但仍能从市场中获利，只是不会像英伟达那样取得巨大成功。

主持人：谷歌的 TPU 情况呢？它似乎是仅次于英伟达的选择。

迪伦：谷歌的 TPU 在系统和基础设施方面有其独特之处。单个 TPU 的性能虽然不错，但更重要的是其系统设计。谷歌与博通合作构建的 TPU 系统，在芯片互连、网络架构等方面具有竞争力，甚至在某些方面优于英伟达。

此外，谷歌多年来采用水冷技术，提高了系统的可靠性，而英伟达直到最近才意识到需要水冷技术。

然而，谷歌的 TPU 在商业上的成功相对有限，主要原因包括其软件不够开放，很多内部使用的软件（如 DeepMind 使用的软件）未向谷歌云用户提供；

定价方面，虽然官方定价较高，但实际谈判后价格仍缺乏竞争力，相比其他云服务提供商（如甲骨文、微软、亚马逊等），谷歌的 TPU 价格没有优势；

此外，谷歌将大量 TPU 用于内部服务（如搜索、Gemini 应用等），外部租用市场份额较小，主要客户为苹果，且苹果租用 TPU 可能与对英伟达的态度有关（可能存在竞争关系，但具体原因暂未提及）。

主持人：那亚马逊呢？能像介绍谷歌 TPU 那样详细介绍一下亚马逊的芯片吗？

迪伦：亚马逊的芯片可以被称为 “亚马逊基础版 TPU”。它在一些方面具有成本效益优势，例如使用更多的硅和内存，网络能力与 TPU 有一定可比性，但在效率方面存在不足，如使用更多的有源电缆（与博通合作的谷歌 TPU 使用无源电缆），硅片面积使用效率较低等。

然而，亚马逊通过降低成本，在 HBM 内存带宽和每美元成本方面具有优势，其芯片价格远低于英伟达，虽然在技术指标上（如内存、带宽等）低于英伟达，但对于一些对成本敏感的应用场景具有吸引力。

亚马逊与 Anthropic 合作建立了一个包含 40 万个芯片的超级计算机系统，他们相信大规模的芯片部署对于推理和模型改进是有用的，尽管在技术上可能不是最先进的，但成本效益使其成为亚马逊的一个合理选择。

明年资本开支明确，26年后存在不确定性

主持人：展望 2025 - 2026 年，你对半导体市场有什么看法？比如博通最近股价上涨，英伟达股价波动，你认为市场会如何发展？

迪伦：博通在定制 ASIC 领域取得了一些成果，例如赢得了多个定制 ASIC 订单，包括谷歌等公司的订单。谷歌正在努力提升其定制芯片的性能，尤其是在推荐系统方面。此外，像 OpenAI 等公司也在开发自己的芯片，苹果也有部分芯片与博通合作生产。这些发展趋势表明，市场竞争将更加激烈。

从市场整体来看，超大规模数据中心计划在明年大幅增加支出，这将带动整个半导体生态系统（包括网络设备供应商、ASIC 供应商、系统供应商等）的发展。

然而，2026 年的情况存在一定的不确定性。

一方面，模型性能是否能够持续提升将是关键因素。如果模型性能提升速度放缓，可能会导致市场出现调整，因为目前市场的增长在很大程度上依赖于模型性能的不断进步以及由此带来的对计算资源的需求增长。

另一方面，资本投入也是一个重要变量。目前中东主权财富基金、新加坡、北欧和加拿大养老基金等尚未大规模进入该市场，但如果他们未来决定投入大量资金，将对市场产生重大影响。

此外，新云市场将面临整合。目前我们跟踪的约 80 家新云服务提供商中，只有少数（5 - 10 家）可能在竞争中存活下来。其中 5 家是主权云服务提供商，另外 5 家左右是具有市场竞争力的企业。

当前，GPU 租赁市场价格变化迅速，例如英伟达 H100 的租赁价格大幅下降，不仅新云服务提供商之间的竞争激烈，亚马逊等大型云服务提供商的按需 GPU 定价也在快速下降。企业购买 GPU 集群的比例仍然相对较低，他们更倾向于将 GPU 计算需求外包给新云服务提供商，但随着市场整合，这种情况可能会发生变化。

对于英伟达来说，虽然其面临竞争，但如果能够继续保持技术领先，推出性能更优、成本更低的产品，仍然有机会在市场中占据主导地位。例如，他们即将推出的产品成本虽然高于前代产品，但通过优化性能和调整价格策略，仍有可能实现增长。然而，如果市场需求未能如预期增长，或者出现更具竞争力的替代品，英伟达的收入可能会受到影响。

主持人：非常感谢迪伦今天的分享，这让我们对半导体行业在 AI 领域的发展有了更深入的了解。希望在未来我们能继续关注这个领域的动态，也期待看到各公司在这个充满机遇和挑战的市场中的表现。再次感谢！

迪伦：谢谢，很高兴能在这里分享我的观点。

主持人：提醒一下大家，以上内容仅代表我们的观点，不构成投资建议。

Excluding Google, 98% of the Global AI workloads are running on NVIDIA chips; Google and Amazon's chips currently face their own issues and pose no challenge in the short term; the shortage of data is a false proposition, as data can be synthesized for continued training; there are no issues with AI capital expenditures next year, but there is uncertainty in 2026, which could be a turning point for the Industry.

What is the actual market share of NVIDIA? What are the company's competitive advantages? Where are the opportunities for AMD, Google, and Amazon? Is data scarcity a false proposition? Are there really no issues with industry capital expenditures? Where is the turning point?

Recently, Dylan Patel, founder and chief analyst of Semi Analysis, Bill Gurley, a well-known technology investor in Silicon Valley, and Brad Gerstner held a tripartite discussion on the current state of AI chips, how long NVIDIA's competitive advantage can last, whether data scarcity is a false tomorrow, and how long AI capital expenditures can continue.

The following are the core points of the discussion:

Excluding Google, 98% of global AI workloads are run on NVIDIA chips, and if Google is considered, this data drops to 70%.
NVIDIA has three advantages: the company's software is superior to most semiconductor companies; in terms of hardware, they can adopt new technologies first and push chips from design to deployment at a very fast pace; in networking, their acquisition of MELLONOX has greatly enhanced their networking capabilities.
Although Google has its own understanding in software and computing elements, it needs to cooperate with other suppliers in difficult areas such as chip packaging design and networking.
As datacenters are being built and power supply becomes tight, companies need to plan their resources more reasonably.
Text is currently the most effective field of data, but video data contains more information. Additionally, pre-training is only part of model training, and inference time calculation is also important. If data runs out, model improvement can continue by creating synthetic data.
Although the one-time huge benefits of pre-training may have passed, companies can still gain certain benefits by increasing computing resources, especially in a competitive environment. The benefits still exist, but obtaining them has become more challenging.
Synthetic data is most effective in areas where functional validation can be performed.
Wall Street's current estimates of data center capital expenditure are often too low. By tracking global data centers, companies like Microsoft, Meta, and Amazon have significant spending on data center capacity. This indicates that they believe they can win in competition by scaling up, which is why they continue to invest.
NVIDIA is not Cisco in 2000; the valuations of the two are not comparable.
Pre-training may encounter diminishing returns or excessive costs, but synthetic data generation and inference time calculation have become new areas of development.
Currently, companies' investments in inference are relatively small. A significant improvement in model performance is expected in certain benchmark tests with functional validation in the next 6 months to 1 year.
Currently, GPT-4 is very expensive, but costs would significantly decrease if the model scale is reduced.
AMD excels in chip engineering but has significant shortcomings in Software. They lack sufficient Software Development personnel and have not invested in building GPU clusters for software development, which sharply contrasts with NVIDIA.
The TPU system built by Google in collaboration with Broadcom is competitive in terms of chip interconnection, network architecture, etc., and even surpasses NVIDIA in some aspects.
Google's TPU has relatively limited commercial success, primarily due to its software being not open enough, uncompetitive pricing, and being mainly used for internal services.
Amazon's chips are advantageous in HBM memory bandwidth and cost per dollar through cost reduction, although they are inferior to NVIDIA in technical indicators (such as memory, bandwidth, etc.), they are appealing for certain cost-sensitive application scenarios.
Overall in the market, hyperscale datacenters are expected to significantly increase spending next year, which will drive the development of the entire Semiconductors ecosystem (including networking equipment suppliers, ASIC suppliers, system suppliers, etc.).
The situation in 2026 carries some uncertainty. On one hand, whether model performance can continue to improve will be a key factor. If the rate of improvement in model performance slows, it may lead to market adjustments.

Below is the full transcript of the conversation, translated by AI.

Host: Dylan, welcome to our show. Today we will delve into a fundamental change occurring in the world of computers, a topic that has been discussed throughout this year. Bill, please introduce Dylan to everyone.

Bill: Alright, we are pleased to invite Dylan Patel from SemiAnalysis. Dylan has quickly established one of the most respected research teams in the global semiconductor industry. Today we want to delve into Dylan's insights on architecture, chip scaling trends, major global market players, supply chains, and other knowledge at the technical level, and connect that with the business issues that concern our audience. I hope to provide a phased summary of semiconductor activities related to the AI boom and attempt to grasp the overall development trends.

Dylan: I'm glad to be here. When I was a child, my Xbox broke, my parents were immigrants, and I grew up in rural Georgia with nothing much to do, so I tinkered with electronics. I opened my Xbox, shorted the temperature sensor, and then fixed it. Since then, I became intensely interested in semiconductors, started reading semiconductor companies' earnings reports, and invested, as well as delved into technology-related content.

Host: Could you briefly introduce us to SemiAnalysis?

Dylan: We are a semiconductor and AI research company that provides services to hyperscale data centers, large semiconductor private equity firms, and hedge funds.

We sell relevant data on global datacenters, including quarterly power usage, construction progress, etc.; we track about 1,500 wafer fabs globally (but only around 50 are actually critical); we also provide supply chain-related data, such as data on cables, servers, circuit boards, transformers, and other equipment, along with predictions and consulting services.

Excluding Google, over 98% of AI workloads globally run on NVIDIA chips.

Bill: Dylan, we all know that NVIDIA dominates the AI chip field, how much of the current global AI workload do you think runs on NVIDIA chips?

Dylan: If we exclude Google, the proportion exceeds 98%. But if we include Google, it is about 70%. Because a significant portion of Google's AI workloads, especially productive workloads, run on its own chips.

Bill: Are you referring to productive workloads that generate revenue, such as Google Search and other large AI-driven businesses at Google?

Dylan: Exactly. Google's non-large language model (LLM) and other productive workloads run on its internally developed chips.

In fact, Google was using Transformer technology in search workloads as early as 2018-2019, with BERT being one of the well-known and popular Transformer models at that time, running in their production search workloads for years.

The combination of three advantages has currently enabled NVIDIA to dominate the market.

Bill: Now back to NVIDIA, why does it dominate the market so much?

Dylan: NVIDIA can be likened to a three-headed dragon. Most semiconductor companies perform poorly in software, but NVIDIA is an exception.

In terms of hardware, NVIDIA also excels compared to most companies, being able to adopt new technologies more quickly and move chips from design to deployment at a rapid pace. Additionally, they acquired Mellanox, greatly enhancing their networking capabilities. The combination of these three advantages makes it difficult for other semiconductor companies to compete with them individually.

Bill: You previously wrote an article that helped everyone understand the complexities of NVIDIA's modern cutting-edge deployments, including aspects like racks, memory, networking, and scale. Could you briefly introduce these again?

Dylan: Alright. When we look at GPUs, running an AI workload often requires multiple chips to work together, as the scale of the model has far exceeded the capacity of a single chip.

NVIDIA's NVLink architecture effectively connects multiple chips, but interestingly, Google and Broadcom had collaborated to build similar system architectures even before NVIDIA, such as Google creating a similar system with TPU back in 2018.

Although Google has its own understanding in software and computing elements, it needs to cooperate with other suppliers in difficult areas such as chip packaging design and networking.

Now, NVIDIA has launched the Blackwell system, a rack containing multiple GPUs, weighing three tons, with thousands of cables, making it very complex.

Competitors such as AMD have also recently entered the system design field through acquisitions because building a multi-chip system that works collaboratively, cools well, and has reliable networking is a highly challenging problem, as semiconductor companies usually lack relevant engineers.

Bill: So where do you think NVIDIA has made incremental differentiated investments?

Dylan: NVIDIA has primarily made significant investments in the supply chain. They must work closely with the supply chain to develop next-generation technologies and bring them to market first.

For example, in areas such as networking, optics, water cooling, and power transmission, NVIDIA continuously introduces new technologies to maintain its competitive edge. Their pace is very fast, with many changes each year, like the launch of products such as Blackwell and Rubin. If they stagnate, they will face competitive pressure as other competitors are also striving to catch up.

Bill: If NVIDIA stagnates, in what areas might they face competition? What conditions must other alternatives meet in order to capture a larger share of workloads in the market?

Dylan: For NVIDIA, their major customers are spending heavily on AI, and they have enough resources to research how to run models on other Hardware, especially in terms of inference.

Although NVIDIA's advantage in inference Software is relatively small, their Hardware performance is currently the best, which means lower capital costs, operational costs, and higher performance. If NVIDIA stops making progress, their performance advantage will no longer grow, giving other competitors an opportunity.

For example, now with the launch of Blackwell, NVIDIA is not only 10 - 15 times faster in inference performance than previous products (optimized for large models), but has also lowered the profit margin to respond to competition. They plan to improve performance over five times a year, which is a very fast pace. At the same time, AI models themselves are also being continuously improved, and costs are decreasing, which will further stimulate demand.

Bill: You mentioned that the role of Software in training and inference is different. Can you explain that in more detail?

Dylan: Many people simply refer to NVIDIA's Software as Kuta, but in fact, it contains many levels.

In terms of training, users usually rely on the performance of NVIDIA's Software because researchers continuously try new methods and do not have much time to optimize performance.

In terms of inference, companies like Microsoft deploy on a limited number of models and update the models roughly every six months. They can invest a large number of engineers to optimize the running performance of these models on Other Hardware. For example, Microsoft has already deployed GPT-style models on the Hardware of companies such as AMD.

Host: We previously mentioned a chart showing that there will be one trillion dollars of new AI workloads and one trillion dollars of datacenter replacement workloads over the next four years. What is your view on this? Some believe that people will not use NVIDIA's GPUs to rebuild CPU datacenters. How do you respond to this viewpoint?

Dylan: NVIDIA has long been promoting the use of accelerators for non-AI workloads, such as in the professional visualization field (like Pixar making movies) and Siemens engineering applications, which both utilize GPUs.

Although these are only a small part of the AI field, applications do exist. Regarding datacenter replacements, although AI is developing rapidly, traditional workloads (like network services and databases) will not stop or slow down as a result. The supply chain for datacenters is long, and the construction cycle is also lengthy, which is a realistic problem.

For example, Intel's CPUs have made slow progress over the past few years, and AMD's emergence has provided higher performance options. Many old Intel CPU servers in Amazon datacenters have been in use for years and can now be replaced with newer, higher-performance servers (like 128-core or 192-core), which can not only improve performance but also reduce the number of servers under the same power consumption, thus freeing up space for AI servers.

So, while there are cases of datacenter replacements, the overall market is still growing; it is just that the development of AI has driven this behavior, as businesses need more computing power to support AI applications.

Host: This reminds me of what Sasha mentioned in the program last week, where he said they are limited by datacenters and electrical utilities, not by chip supply. Do you think this relates to your previous explanation?

Dylan: I think Sasha's viewpoint emphasizes the bottleneck position of datacenters and electrical utilities, which is different from the chip supply situation. With the construction of datacenters and the tight supply of electricity, businesses need to plan resources more rationally, which also explains why they take some measures, such as acquiring electrical resources from cryptocurrency mining companies or extending the depreciation cycle of old servers.

If there are no data, synthetic data can be created to improve models.

Host: Before discussing alternatives to NVIDIA, let's first talk about the pre-training and scaling debate mentioned in your article. Iliat stated that data is the "fossil fuel" of AI, and we have consumed most of it, suggesting that the significant gains from pre-training will not be repeated. What do you think of this perspective?

Dylan: The pre-training scaling law is relatively simple; increasing computational resources can enhance model performance, but this involves two dimensions: data and parameters.

When data runs out, although the model can continue to be scaled up, the returns may decrease. However, our current utilization of video data is still very limited, which is a misconception. In fact, text is the most effective data domain at present, but video data contains much more information. Additionally, pre-training is just one part of model training; inference time calculations are also significant. If data is exhausted, we can continue to improve the model by creating synthetic data, such as the methods that companies like OpenAI are trying, where the model generates a large amount of data and then verifies its functionality, filtering out effective data for training to enhance model performance. Although this method is still in its early stages, with relatively low funding, it provides a new direction for model improvement.

Host: From an investment perspective, NVIDIA is under the spotlight. But if the returns from pre-training have mostly been obtained, why are people still building larger clusters?

Dylan: Although the one-time significant gains from pre-training may be over, we can still achieve certain returns by increasing computational resources, especially in a competitive environment where companies seek to enhance model performance to maintain competitiveness.

Moreover, the comparison between models and competitors' models also drives companies to continue investing. Although, from an ROI perspective, continuing to scale may be exponentially expensive, it can still be a rational decision, as returns still exist, albeit with increased difficulty in obtaining them. Furthermore, with the emergence of new methods like synthetic data generation, the speed of model improvement may accelerate, providing motivation for continued investment from companies.

Host: In which areas is synthetic data most effective? Can you provide examples?

Dylan: Synthetic data is most effective in areas where functionality can be validated, such as in Google's services, which have extensive unit testing to ensure the system operates correctly; these unit tests can be used to evaluate whether the outputs generated by LLM are accurate.

In fields such as mathematics and engineering, outputs can be evaluated against clear standards, whereas in some subjective fields like art, writing style, and negotiation skills, it is challenging to conduct functional validation because the criteria for judgment are more subjective. For example, in the field of image generation, it is difficult to determine which image is more beautiful as it depends on personal preferences; however, in mathematical calculations or engineering designs, it is possible to clearly determine if the output is correct.

Wall Street has underestimated the capital expenditure of large datacenters.

Host: What have you heard from the hyperscale datacenters? They all indicate that capital expenditure (capex) will increase next year and that they are building larger clusters, is this true?

Dylan: Based on our tracking and analysis, Wall Street's estimates of capex are generally too low. We track every datacenter globally and find that companies like Microsoft, Meta, and Amazon are investing significantly in datacenter capacity.

They have signed datacenter lease agreements for next year, expecting cloud revenue to accelerate as they are currently limited by datacenter capacity. This indicates that they believe winning in competition is possible through scaling, which is why they continue to invest.

Host: You previously mentioned the large-scale cluster construction for pre-training. If there is a change in the pre-training trend, what would be the changes in their construction for inference?

Dylan: In training neural networks, forward propagation is used to generate data, and backward propagation is used to update weights. In the new paradigms of synthetic data generation, output evaluation, and model training, the computational load for forward propagation significantly increases because a large number of possibilities must be generated, whereas the computational load for backward propagation is relatively small since training occurs only on a few effective data points. This means that during training, there is a large amount of inference computation, and in fact, the amount of inference computation during training exceeds that of updating model weights.

Additionally, whether all components need to be in the same location during model training depends on the specific circumstances.

For example, Microsoft is building multiple Datacenters in different regions because they find that reasoning workloads can be allocated to different Datacenters, while models are updated elsewhere, allowing for more efficient resource utilization. Therefore, the pre-training paradigm has not slowed down; rather, the cost of improvements in each generation increases logarithmically, but businesses are looking for other ways to reduce costs and increase efficiency.

NVIDIA is not Cisco of the year 2000.

Host: Some people are comparing NVIDIA to Cisco's situation in 2000, what do you think?

Dylan: This comparison is somewhat unfair. A large part of Cisco's revenue came from private/Crediting investments in telecommunications infrastructure, while NVIDIA's revenue sources are different, with a smaller proportion of private/Crediting investments, such as CoreWeave supported by Microsoft.

Additionally, during the Internet bubble, the scale of private capital entering the field was much larger than it is now. Although the venture capital market seems active now, the private market (such as Middle Eastern sovereign wealth funds) has not yet seen large amounts of funds enter. Moreover, compared to Cisco's time, the sources of capital, positive cash flow, and rationality of investment for these profitable companies have all changed. NVIDIA currently has a PE of 30, which is still significantly lower compared to Cisco’s 120, so a simple comparison cannot be made.

Inference time reasoning is a new direction for expanding intelligence.

Host: You mentioned that inference time reasoning is a new direction for expanding intelligence and that its computational intensity is greater than pre-training, can you explain this in detail?

Dylan: Pre-training may encounter diminishing returns or excessive costs, but synthetic data generation and inference time computation have become new development directions.

Calculating inference time sounds good because it doesn't require spending more on training models, but there are significant trade-offs. For example, with GPT-4o, it generates a large amount of data during inference, but ultimately only a portion of that is output to users, consuming a lot of computational resources in the process.

For example, when handling user requests, the model may generate thousands of intermediate results (tokens), but only a few hundred are ultimately output to users. This means that computational costs increase significantly, not only due to the increase in the number of generated tokens but also because handling these tokens requires more memory to store contextual information (like KV cache), which reduces the number of user requests the server can handle simultaneously, thereby increasing the cost per user.

From a cost perspective, for a company like Microsoft, if its inference revenue is $10 billion, with a gross margin of 50-70%, and costs in the billions, when using models like GPT-4o, the costs may significantly increase due to the rising inference computing costs. Even though the model performs better and can charge higher fees, the increase in costs might exceed the increase in revenue.

The enterprise-level demand for the GPT-4o model has been underestimated.

Host: So, has the market's enterprise-level demand for models like GPT-4o been overestimated or underestimated?

Dylan: GPT-4o is still in its early stages, and people's understanding and application of it are not yet deep.

However, from some anonymous benchmarking tests currently available, many companies (like Google and Anthropic) are developing inference models, and they see a clear path to improving model performance by increasing computational resources. These companies have relatively low investment in inference and are still in the initial stages, but they have significant room for improvement. It is expected that within the next 6 months to 1 year, model performance will greatly improve in some functional validation benchmark tests. Therefore, the potential demand for such models in the market is enormous, but it is currently difficult to assess accurately.

Host: Looking back at the Internet wave, many startups initially relied on Oracle and Sun Microsystems technologies, but the situation changed five years later. Will this happen in the AI Chip field?

Dylan: Currently, GPT-4o is very expensive, but if the model size is reduced, the cost will decrease significantly.

For example, the cost can be greatly reduced from GPT-4o to Llama 7b. For smaller models, inference is relatively easy and can run on a single chip, leading to intense market competition, with many companies offering API inference services based on models like Llama, resulting in fierce price competition and lower profit margins.

In contrast, companies like Microsoft that use OpenAI models have higher gross margins (50-70%) because they possess high-performance models and have enterprises or consumers willing to pay a premium for them.

However, as more companies enter the market, model differentiation becomes increasingly important; only those with the best models and the ability to find enterprises or consumers willing to pay for them can stand out in the competition. Therefore, the market is quickly filtering, and ultimately, only a few companies may be able to compete in this field.

Google and Amazon chips each have their strengths and weaknesses.

Host: So what is the situation with AMD among these competing companies?

Dylan: AMD performs excellently in chip engineering but has significant shortcomings in software. They lack sufficient software developers and have not invested in building GPU clusters to develop software, which sharply contrasts with NVIDIA.

Additionally, AMD has been focused on competing with Intel and lacks system-level design experience. Although they acquired ZT Systems, they still lag behind NVIDIA in system architecture design for large-scale datacenters.

Large-scale Datacenter customers (such as Meta and Microsoft) are helping AMD improve Software and understand model development, inference economics, etc., but AMD still cannot compete with NVIDIA on the same timeline. It is expected that AMD's share of AI revenue among customers like Microsoft and Meta will decline next year, but it will still profit from the market, although it won't achieve the tremendous success like NVIDIA.

Host: What about Google's TPU? It seems to be the second choice after NVIDIA.

Dylan: Google’s TPU has its unique features in systems and infrastructure. The performance of a single TPU is good, but more importantly, its system design is key. The TPU system built in collaboration with Broadcom is competitive in chip interconnect, network architecture, etc., and even surpasses NVIDIA in certain aspects.

In addition, Google has adopted water cooling technology over the years, enhancing the reliability of the system, while NVIDIA only recently realized the need for water cooling technology.

However, the commercial success of Google’s TPU is relatively limited, mainly due to its Software not being open enough; many internally used Software (such as the ones used by DeepMind) are not offered to Google Cloud users.

In terms of pricing, although the official price is high, the actual negotiated price still lacks competitiveness; compared to other Cloud Computing Service providers (such as Oracle, Microsoft, Amazon, etc.), Google's TPU pricing does not have an advantage.

Moreover, Google utilizes a large number of TPUs for internal services (such as search, Gemini applications, etc.), with a small external rental market share, mainly renting to Apple, and Apple’s rental of TPUs may be related to its attitude towards NVIDIA (there may be competitive relations, but the specific reasons have not been mentioned yet).

Host: What about Amazon? Can you provide a detailed introduction to Amazon's chips just like the one for Google's TPU?

Dylan: Amazon's chip can be referred to as the 'Amazon Basic TPU'. It has cost-effective advantages in some aspects, such as using more silicon and memory, with network capabilities somewhat comparable to TPU. However, it falls short in efficiency, such as using more active cables (Google TPU, in cooperation with Broadcom, uses passive cables) and lower silicon area utilization efficiency.

However, Amazon has advantages in HBM memory bandwidth and cost per dollar due to lower costs, with its chip prices significantly lower than NVIDIA's. Although it is inferior to NVIDIA in technical indicators (such as memory, bandwidth, etc.), it is attractive for some cost-sensitive application scenarios.

Amazon has partnered with Anthropic to establish a supercomputer system containing 0.4 million chips. They believe that large-scale chip deployment is useful for inference and model improvement, and although it may not be the most advanced technically, its cost-effectiveness makes it a reasonable choice for Amazon.

Next year's capital expenditures are clear, but there is uncertainty after 2026.

Host: Looking ahead to 2025 - 2026, what are your views on the semiconductor market? For example, Broadcom's recent stock price increase, NVIDIA's stock price fluctuations, how do you think the market will develop?

Dylan: Broadcom has achieved some success in the custom ASIC field, winning several custom ASIC orders, including orders from companies like Google. Google is working to enhance the performance of its custom chips, especially in recommendation systems. Additionally, companies like OpenAI are also developing their own chips, and Apple has some chips produced in cooperation with Broadcom. These trends indicate that market competition will intensify.

Overall in the market, hyperscale datacenters are expected to significantly increase spending next year, which will drive the development of the entire Semiconductors ecosystem (including networking equipment suppliers, ASIC suppliers, system suppliers, etc.).

However, there is some uncertainty regarding the situation in 2026.

On one hand, whether the model performance can continue to improve will be a key factor. If the speed of model performance improvement slows down, it may lead to a market adjustment, as the current growth of the market largely depends on the continuous advancement in model performance and the resulting growth in demand for computing resources.

On the other hand, capital investment is also an important variable. Currently, sovereign wealth funds from the Middle East, as well as pension funds from Singapore, Nordic countries, and Canada, have not yet entered the market in large scale, but if they decide to invest substantial funds in the future, it will have a significant impact on the market.

In addition, the new Cloud Computing Service market will face consolidation. Among approximately 80 new cloud service providers we are tracking, only a few (5 - 10) may survive in the competition. Five of them are sovereign cloud service providers, and around five are competitive enterprises.

Currently, the GPU leasing market prices are changing rapidly, for instance, the leasing price of NVIDIA H100 has dropped significantly, and the competition among new cloud service providers is fierce. The on-demand GPU pricing from large cloud service providers such as Amazon is also declining rapidly. The proportion of enterprises purchasing GPU clusters remains relatively low; they are more inclined to outsource GPU computing needs to new cloud service providers. However, this situation may change with market consolidation.

For NVIDIA, although it faces competition, there is still an opportunity to dominate the market if it can continue to maintain CSI Leading Technology Index and launch products with better performance at lower costs. For example, although the cost of their upcoming products is higher than that of previous generations, it is still possible to achieve growth through performance optimization and pricing strategy adjustments. However, if market demand does not grow as expected or more competitive alternatives emerge, NVIDIA's revenue may be affected.

Host: Thank you very much, Dylan, for today's sharing; it has given us a deeper understanding of the development of the Semiconductor Industry in the AI field. We hope to continue to focus on the dynamics of this area in the future and look forward to seeing how companies perform in this market full of opportunities and challenges. Thank you once again!

Dylan: Thank you, I am glad to share my views here.

Host: Just a reminder, the above content represents our views only and does not constitute investment advice.

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

SemiAnalysis创始人深度对话：AI新时代，英伟达会不会被挑战？