share_log

曝英伟达“因设计缺陷,新AI芯片推迟发布”,大摩称“暂停生产两周,四季度赶上进度”

Nvidia's new AI chip release has been delayed due to design defects, according to reports. Deutsche Bank stated that production will be paused for two weeks and resumed in the fourth quarter to catch up with the schedule.

wallstreetcn ·  13:51

Source: Wall Street See
Author: Zhang Yaqi.

According to Daiwa Securities, production based on the original Blackwell design began at the end of the second quarter of 2024, and any technical issues related to the original design can still be resolved through software systems. The redesigned Blackwell has already been completed at Taiwan Semiconductor and will become the version for larger-scale production in the fourth quarter of 2024.

The most advanced AI chip in NVIDIA's new Blackwell series chips may face delays in release.

According to insiders cited by The Information, NVIDIA's upcoming AI chips will be delayed for three months or longer due to design flaws, and large-scale production of Blackwell may be delayed until Q1 next year. This may affect customers such as Meta Platforms, Google, and Microsoft, which collectively ordered billions of dollars worth of chips.

Morgan Stanley says production of Blackwell chips may be suspended for about two weeks, but can catch up through Taiwan Semiconductor's efforts in the fourth quarter of 2024.

NVIDIA declined to comment on the delay statement, but said that customers are testing samples of Blackwell chips and that production is expected to increase later this year.

It is not common to find significant design flaws before mass production.

According to insiders cited by The Information, Blackwell design issues have emerged in recent weeks, as TSMC engineers discovered defects when preparing for mass production.

The GB200 chip contains two connected Blackwell GPUs and a Grace central processing unit. The defect issue involves a processor chip (a silicon wafer used to accommodate chip circuits) that connects the two Blackwell GPUs. This obstacle reduces TSMC's chip production capability for NVIDIA, and may even cause the company to suspend production.

Reportedly, NVIDIA is conducting new trial production runs with its chip manufacturer TSMC. In order not to let machines limit, TSMC restarted production of another high-profile product that is close to mass production to address the issue. This situation is also rare.

It is very unusual to find significant design flaws before mass production, because multiple production tests, runs, and simulations are required in the early stages to ensure product feasibility and smooth manufacturing processes.

According to the original plan, TSMC will begin mass production of Blackwell chips in the third quarter and deliver them to NVIDIA starting from the fourth quarter. Huang Renxun said in May that the company plans to ship a large number of Blackwells later this year.

This design flaw may delay the main Blackwell chips (B200 and GB200) for three months or longer, and large-scale production of Blackwell may be delayed until Q1 next year. Because it usually takes cloud providers about three months to put their large-scale clusters into operation after receiving the chips.

Morgan Stanley: It's just an improvement, not a delay, and it can catch up in the fourth quarter.

Morgan Stanley analyst Charlie Chan said in the report that this is an improvement on Blackwell, not a delay:

We understand that the production of the original Blackwell design began at the end of Q2 2024, and any technical issues related to the original design can still be resolved through software systems. NVIDIA hopes to further improve the stability of Blackwell by replacing some masks, that is, by "redesigning" it.

The redesign of Blackwell has been completed at TSMC and will become a larger batch production version in Q4 2024.

The giants only care about when they can receive the goods?

Blackwell can be described as the "white moonlight" in the hearts of technology companies, carrying the high expectations of giants.

If the upcoming AI chips such as B100, B200, and GB200 are delayed for three months or longer, NVIDIA's customers will really be worried. These customers include Microsoft, Meta, and OpenAI, who have high expectations for NVIDIA's AI chips and plan to use NVIDIA's "supercomputer" to produce future generations of large language models, Meta AI assistants, and other automated functions.

The Information cited insiders as saying that Meta has set an order worth at least $10 billion, and Microsoft has increased its order size by 20% in recent weeks. Microsoft plans to prepare 55,000-65,000 GB200 chips for OpenAI by Q1 2025.

NVLink server racks may be affected.

The design flaw will also affect the production and delivery of Nvidia NVLink server racks, as companies involved in server work must wait for new chip samples before finalizing server rack designs.

Previously, Guo Mingchi, an analyst at TF International Securities, pointed out that while GB200 NVL36's computing power advantage is beyond doubt, it also faces many unprecedented design and production challenges, and whether it can ensure mass delivery on schedule is doubtful.

Each cabinet of GB200 NVL36 consumes about 80 kW, and according to AMAX's investigation in April this year, less than 5% of data centers worldwide can support 50 kW servers per cabinet. Therefore, before purchasing GB200 NVL36, you need to make sure that there is enough space to install it.

The single cabinet version of GB200 NVL72 consumes 130kW per cabinet and cannot be mass-produced in the short term.

Editor/Jeffy

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment