Track the latest AI trends

Nvidia's new AI chip release has been delayed due to design defects, according to reports. Deutsche Bank stated that production will be paused for two weeks and resumed in the fourth quarter to catch up with the schedule.

来源：华尔街见闻
作者：张雅琦

大摩表示，原始Blackwell设计的生产已于2024年第二季度末开始，任何与原始设计有关的技术问题仍可以通过软件系统解决。Blackwell的重新设计已经在台积电完成，并将在2024年第四季度成为更大批量生产的版本。

英伟达新的Blackwell系列芯片中最先进的AI芯片可能面临延迟发布。

据The Information援引知情人士称，英伟达即将推出的人工智能芯片将因设计缺陷而推迟三个月或更长时间，Blackwell大量生产或延迟至明年Q1。这可能会影响 Meta Platforms、谷歌和 Microsoft 等客户，这些公司集体订购了价值数百亿美元的芯片。

摩根士丹利则在最新报告中表示，Blackwell芯片的生产可能会暂停约两周，但可以在2024年第四季度通过台积电的努力赶上。

目前英伟达方面不愿就有关延迟的声明发表评论，但表示客户正在测试Blackwell芯片的样品，并且今年晚些时候“产量有望提高”。

在大规模生产之前发现重大设计缺陷并不常见

The Information援引参与Blackwell芯片制作人士称，最近几周出现了Blackwell设计问题，因为台积电的工程师在准备大规模生产时发现了缺陷。

GB200 芯片包含两个连接的 Blackwell GPU 和一个 Grace 中央处理单元。该缺陷问题涉及一个处理器芯片（一块用于容纳芯片电路的硅片），该芯片连接了两个Blackwell GPU。这一障碍降低了台积电能够为英伟达生产的芯片产量，甚至有可能使公司停止生产。

报道称，英伟达正在与其芯片制造商台积电进行新的试生产运行。为了不让机器限制，台积电重新开始生产另一款接近大规模生产的高知名度产品，以解决问题。这种情况也很罕见。

分析认为，在大规模生产之前发现重大设计缺陷是非常不寻常的。因为前期需要进行多次生产测试运行和模拟，以确保产品的可行性和顺利的制造过程。

按照原计划，台积电将在第三季度开始大规模生产Blackwell芯片，并从第四季度开始交付给英伟达。黄仁勋曾在5月份表示，公司计划在今年晚些时候出货大量Blackwell。

而这次的设计缺陷问题，或将使Blackwell主要芯片（B200和GB200）延迟3个月或更长时间，Blackwell大量生产延迟至明年Q1。因为在收到芯片后，云提供商通常需要大约三个月的时间才能将/其大规模集群投入运行。

大摩：只是改进并非延迟，四季度能赶上进度

摩根士丹利分析师Charlie Chan则在报告中表示，此次是对于Blackwell的一些改进工作，并非延迟：

据我们了解，原始Blackwell设计的生产已于2024年第二季度末开始生产，任何与原始设计有关的技术问题仍可以通过软件系统解决。NVIDIA希望通过更换一些光罩，即“重新设计”，进一步提高Blackwell的稳定性。
Blackwell的重新设计已经在台积电完成，并将在2024年第四季度成为更大批量生产的版本。

巨头们只关心什么时候才能收货？

Blackwell可谓是科技公司们心目中的“白月光”，承载着巨头们的厚望。

如果即将推出的B100、B200 和 GB200等 AI 芯片推迟三个月或更长时间，英伟达的客户们真的会“愁上心头”。

这些客户包括Microsoft、Meta和OpenAI等，他们对英伟达AI芯片抱以极大期待，计划使用英伟达开发的“超级计算机”生产出未来几代大型语言模型、Meta AI助手和其他自动化功能等。

the Information援引知情人士称，Meta定下价值至少100亿美元的订单，而Microsoft最近几周将其订单规模增加了20%。Microsoft 计划在 2025 年第一季度之前为 OpenAI 准备好 55000-65000 个 GB200 芯片。

NVLink 服务器机架或受影响

设计缺陷还将影响 Nvidia NVLink 服务器机架的生产和交付，因为从事服务器工作的公司必须等待新的芯片样品，然后才能最终确定服务器机架设计。

此前，天风国际分析师郭明錤就指出，GB200 NVL36的算力优势无庸置疑，但也面临许多前所未见的设计与生产挑战，能否确保如期大量出货，答案存疑。

GB200 NVL36的每个机柜耗电约80kW，而根据AMAX今年四月的调查，目前全球少于5%的数据中心可以支持每机柜50kW服务器。所以，购买GB200 NVL36前，需先确保有没有足够空间安装。
GB200 NVL72的单一机柜版本，每机柜耗电130kW，短期内无法量产。

编辑/Jeffy

Source: Wall Street See
Author: Zhang Yaqi.

According to Daiwa Securities, production based on the original Blackwell design began at the end of the second quarter of 2024, and any technical issues related to the original design can still be resolved through software systems. The redesigned Blackwell has already been completed at Taiwan Semiconductor and will become the version for larger-scale production in the fourth quarter of 2024.

The most advanced AI chip in NVIDIA's new Blackwell series chips may face delays in release.

According to insiders cited by The Information, NVIDIA's upcoming AI chips will be delayed for three months or longer due to design flaws, and large-scale production of Blackwell may be delayed until Q1 next year. This may affect customers such as Meta Platforms, Google, and Microsoft, which collectively ordered billions of dollars worth of chips.

Morgan Stanley says production of Blackwell chips may be suspended for about two weeks, but can catch up through Taiwan Semiconductor's efforts in the fourth quarter of 2024.

NVIDIA declined to comment on the delay statement, but said that customers are testing samples of Blackwell chips and that production is expected to increase later this year.

It is not common to find significant design flaws before mass production.

According to insiders cited by The Information, Blackwell design issues have emerged in recent weeks, as TSMC engineers discovered defects when preparing for mass production.

The GB200 chip contains two connected Blackwell GPUs and a Grace central processing unit. The defect issue involves a processor chip (a silicon wafer used to accommodate chip circuits) that connects the two Blackwell GPUs. This obstacle reduces TSMC's chip production capability for NVIDIA, and may even cause the company to suspend production.

Reportedly, NVIDIA is conducting new trial production runs with its chip manufacturer TSMC. In order not to let machines limit, TSMC restarted production of another high-profile product that is close to mass production to address the issue. This situation is also rare.

It is very unusual to find significant design flaws before mass production, because multiple production tests, runs, and simulations are required in the early stages to ensure product feasibility and smooth manufacturing processes.

According to the original plan, TSMC will begin mass production of Blackwell chips in the third quarter and deliver them to NVIDIA starting from the fourth quarter. Huang Renxun said in May that the company plans to ship a large number of Blackwells later this year.

This design flaw may delay the main Blackwell chips (B200 and GB200) for three months or longer, and large-scale production of Blackwell may be delayed until Q1 next year. Because it usually takes cloud providers about three months to put their large-scale clusters into operation after receiving the chips.

Morgan Stanley: It's just an improvement, not a delay, and it can catch up in the fourth quarter.

Morgan Stanley analyst Charlie Chan said in the report that this is an improvement on Blackwell, not a delay:

We understand that the production of the original Blackwell design began at the end of Q2 2024, and any technical issues related to the original design can still be resolved through software systems. NVIDIA hopes to further improve the stability of Blackwell by replacing some masks, that is, by "redesigning" it.
The redesign of Blackwell has been completed at TSMC and will become a larger batch production version in Q4 2024.

The giants only care about when they can receive the goods?

Blackwell can be described as the "white moonlight" in the hearts of technology companies, carrying the high expectations of giants.

If the upcoming AI chips such as B100, B200, and GB200 are delayed for three months or longer, NVIDIA's customers will really be worried. These customers include Microsoft, Meta, and OpenAI, who have high expectations for NVIDIA's AI chips and plan to use NVIDIA's "supercomputer" to produce future generations of large language models, Meta AI assistants, and other automated functions.

The Information cited insiders as saying that Meta has set an order worth at least $10 billion, and Microsoft has increased its order size by 20% in recent weeks. Microsoft plans to prepare 55,000-65,000 GB200 chips for OpenAI by Q1 2025.

NVLink server racks may be affected.

The design flaw will also affect the production and delivery of Nvidia NVLink server racks, as companies involved in server work must wait for new chip samples before finalizing server rack designs.

Previously, Guo Mingchi, an analyst at TF International Securities, pointed out that while GB200 NVL36's computing power advantage is beyond doubt, it also faces many unprecedented design and production challenges, and whether it can ensure mass delivery on schedule is doubtful.

Each cabinet of GB200 NVL36 consumes about 80 kW, and according to AMAX's investigation in April this year, less than 5% of data centers worldwide can support 50 kW servers per cabinet. Therefore, before purchasing GB200 NVL36, you need to make sure that there is enough space to install it.
The single cabinet version of GB200 NVL72 consumes 130kW per cabinet and cannot be mass-produced in the short term.

Editor/Jeffy

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.