share_log

H200,英伟达终于肯加内存了?

H200, is Nvidia finally willing to add memory?

wallstreetcn ·  Nov 14, 2023 19:13

Source: Wall Street News
Author: Zhao Ying

The wave of artificial intelligence has taken the world by storm. Demand for AI servers has exploded, and HBM chips with ultra-high bandwidth have also become popular.

On Tuesday, Nvidia's official website announced the H200, the most powerful AI chip in history. It is based on the Hopper architecture and is equipped with the Nvidia H200 Tensor Core GPU, which can process massive amounts of data for generative AI and high-performance computing workloads.

Among them, it is worth mentioning that the H200 is the world's first GPU equipped with HBM3e memory, and has up to 141GB of video memory.According to SK Hynix, the HBM3e chip has faster speed and larger capacity, and has reached the highest level in the world in terms of speed, heat control, and ease of use for customers.

All along, due to high cost and other constraints, enterprises have chosen to improve the computing power performance of processors while ignoring memory bandwidth performance. As AI servers place higher demands on high-performance memory bandwidth, HBM (high-bandwidth memory) has become a “saver.”

H200 maximum upgrade: HBM3e

Compared to the previous generation H100, in what aspects has the H200 been upgraded?

Compared with the H100, the performance improvement of the H200 is mainly reflected in inference performance. When dealing with large language models such as Llama 2,The H200 is nearly 2 times faster at inference than the H100.

Achieve a 2x increase in performance within the same power range,This means a 50% reduction in actual energy consumption and overall costs.

Thanks to the Tansformer engine, reduced floating-point accuracy, and faster HBM3 memory, the H100 has improved the inference performance of the GPT-3 175B model by 11 times compared to the A100. With larger and faster HBM3e memory, the H200 directly boosts performance by 18 times without any hardware or code changes.

From H100 to H200, performance increased 1.64 times, all thanks to increased memory capacity and bandwidth.

The H100 has 80 GB and 96 GB of HBM3 memory, providing 3.35 TB/s and 3.9 TB/s of bandwidth in the initial device, respectively, while the H200 has 141 GB faster HBM3e memory with a bandwidth of 4.8 TB/s of total bandwidth.

Elsewhere, according to media analysis, the H200 has so far only worked with the SXM5 slot, and has exactly the same peak performance statistics as the Hopper H100 accelerator in terms of vector and matrix mathematics.

What is HBM?

Due to a range of technical and economic factors, storage performance is often not as fast as the processor. The peak computing capacity of hardware has increased 90,000 times in the past 20 years, yet the memory/hardware interconnect bandwidth has only increased 30 times.

However, when the storage performance cannot keep up with the processor, the time to read and write instructions and data will be tens or even hundreds of times the time spent on the processor's computation. Imagine data transmission is like being in a huge funnel. No matter how much the processor is poured into it, the memory can only “keep flowing.”

In order to reduce energy consumption and latency, many ordinary DDR chips are stacked together and packaged together with GPUs, so called HBM (high bandwidth memory) is obtained.HBM reduces delays caused by memory and storage solutions by increasing bandwidth, expanding memory capacity, and leaving larger models and more parameters closer to core computing.

However, this also means higher costs. Without considering the cost of closed testing, HBM costs about three times that of GDDR. The limiting factor for HBM development is the high cost. The cost of HBM memory on some advanced computing engines is often higher than the chip itself, so adding more memory naturally faces great resistance.

Further media analysis pointed out that if adding memory can double the performance, then the same HPC or AI application performance will only be achieved with half of the devices. Obviously, this idea cannot be supported by the board of directors. I'm afraid this way of actively reducing profits can only happen when there is an oversupply in the market and when three or four manufacturers compete for customer budgets.

The next step is to fight for a breakthrough in memory?

However, with the explosion of AI and higher bandwidth requirements, demand for HBM has surged. In terms of market size, some analysts expect global HBM demand to increase by nearly 60% in 2023 to 290 million GB, increase by another 30% in 2024, and the overall HBM market is expected to reach more than 2 billion US dollars in 2025. SK Hynix has stated that orders for HBM3 in 2024 are already fully scheduled, and the same situation is now happening to Samsung.

At the same time, the analysis indicates thatRegardless of the specific performance of Nvidia's next Blackwell B100 GPU accelerator, it can be determined that it will bring more powerful inference performance, and this performance improvement is likely to come from a breakthrough in memory rather than an upgrade at the computing level。 Therefore, spending money to buy the Nvidia Hopper G200 between now and next summer is not cost-effective, but rapid development is also the norm in data center hardware technology.

Additionally, Nvidia's competitor, AMD, will also launch the “Antares” GPU accelerator family for data centers on December 6, including the Instinct MI300X with 192 GB HBM3 memory and the CPU-GPU hybrid MI300A with 128 GB HBM3 memory.

Editor/jayden

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment