share_log

英伟达年终核弹!全新B300为o1推理大模型打造,RTX5090也曝光了

NVIDIA's year-end blockbuster! The brand new B300 is designed for the o1 inference large model, and the RTX 5090 has also been revealed.

Quantum Position. ·  Dec 26, 2024 12:00

NVIDIA's founder has become this year's Santa Claus.

The AI Chip big gift package has just been revealed:

The new GPU nuclear bomb B300, as well as the super chip GB300 with an attached CPU.

High computing power, with a 50% increase in FLOPS compared to the B200 at the product level.

Large memory, increased from 192GB to 288GB, also a 50% increase.

The "new generation computing unit" GB300 NVL72, which contains 72 GB300s, is even rated as "the only solution that can allow OpenAI o1/o3 reasoning models to achieve a cognitive chain length of 0.1 million tokens at high batch sizes."

This is only a few months apart from the B200 series launched at the "AI Spring Festival" in March of this year.

According to the leak from SemiAnalysis, starting from the third quarter, many AI giants have shifted their Orders from B200 to B300 (only Microsoft continued to purchase some B200 in the fourth quarter).

Many netizens lament that the update speed is just too fast!

It not only resolves the previous rumors about the B200 being delayed due to design flaws, but also responds to AMD's MI300 series plan to increase memory capacity in 2025.

Another AI bomb.

Since both are based on the Blackwell architecture without generational changes, where does the increased computing power of the B300 come from?

According to this leak, there are mainly three parts:

  • Process node, using the same Taiwan Semiconductor 4NP as the B200, but with a brand new tape-out.

  • Increased power, with the TDP of the GB300 and B300 HGX reaching 1.4KW and 1.2KW respectively, compared to the B200 series which increased by 0.2KW.

  • Architectural micro-innovations, such as dynamically allocating power between the CPU and GPU.

In addition to higher FLOPS, the B300 series memory has also been upgraded:

  • Upgraded from 8-layer stacked HBM3E to 12 layers (12-Hi HBM3E).

  • Memory capacity upgraded from 192GB to 288GB.

  • Memory bandwidth remains unchanged, still at 8TB/s.

In addition, there is a significant change in product delivery:

The GB200 series offers the entire Bianca Board, which includes two GPUs, one CPU, memory for the CPU, and all components integrated on a single PCB.

△ GB200 Concept Diagram
△ GB200 Concept Diagram

The GB300 series will only provide Reference Boards, including two B300 GPUs, one Grace CPU, HMC (Hybrid Memory Cube), and LPCAMM memory modules, which will be procured by customers themselves.

This presents new opportunities for OEM and ODM manufacturers in the supply chain.

Built for inference large models.

Upgrading memory is crucial for inference large models like OpenAI o1/o3, as the length of the reasoning chain will increase the KVCache, impacting batch size and latency.

Considering one GB300 NVL72 "compute unit", it allows 72 GPUs to process the same problem with very low latency and share memory.

Upgrading from GB200 to GB300 can also bring many benefits:

  • Lower latency for each reasoning chain.

  • Enabling longer reasoning chains.

  • Reduce inference costs.

  • When handling the same problem, more samples can be searched, ultimately improving model capability.

To explain these improvements, SemiAnalysis provided a more intuitive example.

The following chart shows the processing speed of Llama 3.1 405B at FP8 precision using H100 and H200 GPUs for long sequences under different batch sizes.

The input is set to 1000 tokens and output to 19000 tokens, thereby simulating the thought chains in OpenAI's o1 and o3 models.

There are two significant improvements when upgrading from H100 to H200.

First, H200 has a larger memory bandwidth across all comparable batch sizes (H200 4.8TB/s, H100 3.35TB/s), resulting in an overall efficiency improvement of 43%.

Second, H200 can run a higher batch size, increasing the number of tokens generated per second by three times, correspondingly reducing costs by about three times.

The benefits brought by the increase in memory go far beyond these surface aspects.

It is well known that the response time for inference models is generally longer, and significantly reducing inference time can enhance user experience and frequency of use.

Moreover, the memory upgrade achieves a threefold performance increase and a threefold reduction in costs, and this speed of improvement far exceeds Moore's Law.

In addition, SemiAnalysis has also observed that more powerful and distinctly differentiated models can command higher premiums—

Cutting-edge models have a gross margin of over 70%, while second-tier models competing with open-source models have a gross margin of less than 20%.

Of course, NVIDIA is not the only chip company that can increase memory, but NVIDIA still has the ace in the hole, NVLink.

One More Thing

As for NVIDIA's consumer graphics cards, the PCB of the RTX5090 has also been exposed for the first time~

Just yesterday, a photo of the RTX 5090 PCB went viral online.

The feature is that it is super, super, super large.

Combined with previous leaks stating that the 5090 may be equipped with 32GB of large memory, it is expected to support 8K ultra-high-definition gaming, achieving a smooth gaming experience at 60fps.

Netizens couldn't sit still.

Regarding the release date of the 5090, everyone guesses it will likely be during Jensen's CES speech on January 6.

016.pngWant to enter the market but find it hard to time? The “Monthly Contribution Zone” feature can help you, allowing you to invest at regular intervals and seize investment opportunities!

Market > US Stocks > Click on “Monthly Contribution Stocks” > “Create Monthly Contribution” >Set up a monthly payment plan.

Editor/new

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment