share_log

AMD苏姿丰:敢笑业界无男儿!

AMD's Su Zifeng: Boldly laughing at the industry's lack of men!

wallstreetcn ·  Jun 8 18:58

Author: Zhou Yuan / Wall Street See

The strongest AI PC (laptop) chip today is made by AMD.

If the Three Kingdoms period of China is essentially a civil war among a group of relatives, then today's AI chip companies, AMD's highest leader, Su Zifeng, and NVIDIA's leather-clad leader, Huang Renxun, also have some meaning of family fighting: the weapons are Ryzen 9000 series CPUs and AI PC chips brought by Su Zifeng, the "Ryzen AI 300 series" and data center chips and GPUs.

Within 15 hours after NVIDIA CEO Huang Renxun delivered a speech on AI-related topics and announced GPU and interconnect roadmaps at COMPUTEX 2024, AMD CEO Su Zifeng (Lisa Su) updated AMD's AI acceleration card Instinct GPU series roadmap at the same occasion on June 3.

Su Zifeng used a large number of spot and medium-term products to show AMD's ambition in the field of AI and commitment to future technology development: in the fourth quarter of this year, AMD will launch a new AI acceleration chip Instinct MI325X, followed by MI350 in 2025 and MI400 in 2026.

Simply put, as an upgraded version of the existing MI300 series, the Instinct MI325X AI accelerator card adopts CDNA 3 architecture. This accelerator card will be equipped with up to 288GB of HBM3E memory and 6TB/s of memory bandwidth, providing 1.3PFLOPs of FP16 and 2.6PFLOPs of FP8 computing performance, capable of handling servers with up to 1 trillion parameters.

Su Zifeng said that the AI performance improvement of MI325X is the largest in AMD history, and it will have more than 1.3 times the improvement compared with the competing NVIDIA H200, making it more cost-effective.

According to the AMD Instinct GPU series roadmap, the planned MI350 series to be released in 2025 will be based on the next-generation CDNA 4 architecture and be compatible with OAM (Optimized Accelerated Matrix). The MI350 series will be based on 3nm process technology, providing up to 288GB of HBM3E memory, and supporting FP4/FP6 data types as the MI325X.

The upcoming MI400 series is expected to be launched in 2026 based on the new CDNA Next architecture. In terms of performance, CDNA 3 architecture is expected to be 8 times higher than CDNA 2, and CDNA 4 architecture is expected to provide about 35 times AI inference performance improvement over CDNA 3. AMD did not disclose the performance comparison parameters of CDNA Next architecture.

The most powerful AI PC chip: Where is its strength?

In addition to the above "medium and long-term" products, AMD also launched" real-time" AI acceleration card-the third-generation AI PC chip "Ryzen AI 300 series" with the code name "Strix Point" and the AMD "Ryzen 9000 series" desktop processor.

Among them, the strength of the "Ryzen AI 300 series" allows Su Zifeng to have the confidence to overlook all other competitors: NPU computing power is as high as 50TOPS, surpassing Qualcomm Snapdragon X Elite's 45TOPS and Intel Lunar Lake's 40-45 TOPS. It is said, "Forty thousand people disarmed, and not one is a man." However, the CPU computing power of these three companies has reached or exceeded the AI PC's NPU computing power requirement of Microsoft (40TOPS+).

The initial version of the AMD Ryzen AI 300 series was the Ryzen 7040 series (code-named Phoenix) launched in 2023. This was the world's first x86 processor with an integrated independent NPU AI engine, based on the newly designed XDNA architecture at that time, with a computing power of about 10TOPS. After integrated CPU and GPU, the overall computing power was about 33TOPS, which established the computing power starting point for the new AI PC category.

At the end of the same year, that is, at the end of 2023, AMD launched an iterative version of the Ryzen 7040 series - the Ryzen 8040 series with the code name "Hawk Point", which increased the NPU computing power by 60% to 16TOPS, and the overall computing power also increased to 39TOPS.

The Ryzen AI 300 series launched this time is AMD's third-generation AI chip: it adopts the new Zen5 PU architecture, the GPU core is upgraded to the RDNA3.5 architecture, and the NPU is updated to the XDNA2 architecture, claiming to be the "world-class processor for next-generation AI PC/Copilot+ PC".

At present, as a new category of products, AI PCs start from high-end products whether they are upstream chips or downstream terminals.

According to the information disclosed by Su Zifeng, the Ryzen AI 300 series will launch two models as the first wave - Ryzen AI 9 HX 370 and Ryzen AI 9 HX 365, both positioned in the high-end market. Among them, the former is the top flagship of the high-end market.

The CPU frequency of Ryzen AI 9 HX 370 is as high as 5.1GHz, with 12 cores and 24 threads. Compared with the Ryzen 8040 series, its CPU core count is increased by at least 30%, which is the first time in many years. The total capacity of the secondary cache is increased to 12MB (1MB/core), and the tertiary cache is increased to an unprecedented 24MB, which was previously a maximum of 16MB.

For the GPU part, Ryzen AI 9 HX 370 has upgraded corresponding technology architecture. The number of CU cores has been increased from 12 to 16, named as "Radeon 890M". The NPU computing power has been increased to 50TOPS, which is more than three times than the 16TOPS NPU of Ryzen 8040 series.

Ryzen AI 9 365 has a main frequency of 5.0GHz, 10 cores and 20 threads, 10MB of secondary cache, and other parameters are the same as Ryzen AI 9 HX 370.

XDNA2: The first BF16 floating point accuracy format.

The NPU architecture of the Ryzen AI 300 series adopts the XDNA2 architecture that is "oriented to next-generation AI PC/Copilot+ PC".

According to the technology structure diagram disclosed by Su Zifeng on site, compared with the first-generation XDNA architecture, the structure of XDNA2 architecture is basically unchanged, but the scale has been expanded: the AI computing engine module of the former is called "AIE Tile", with a quantity of 20; in the new generation architecture, the name is changed to "AI Tile", with a quantity increased to 32. The local memory module has increased from 5 in the first generation to 8.

In addition, the cross bus used for interconnection has also been upgraded from the ordinary Data Fabric to the Infinity Fabric of Zen/RDNA family, bringing greater transmission bandwidth and higher data transmission efficiency.

According to official data from AMD, XDNA2 NPU computing power has been increased by as much as 5 times (response speed of Llama 2 7 billion parameter large model, from startup to obtaining the first token), multitasking parallel capability has doubled, and energy efficiency has also increased by up to two times.

XDNA2 architecture also has a technical highlight: the introduction of a brand-new Block FP16 (also known as BFloat16 or BF16) floating-point precision format, which is the first in NPU. Previously, BF16 format was generally used on CPUs and GPUs.

In terms of performance, the FP8 floating-point format has strong performance, but insufficient precision; the opposite is true for FP16 floating-point format, with high precision but slightly lower performance. Now, the BF16 format combines the advantages of both, meeting the 16-bit precision requirements of most AI applications without the need for additional conversion.

Currently, the comprehensive computing power of the Ruilong AI300 series is as high as 50TOPS, surpassing the comprehensive 45TOPS computing power level of Qualcomm Snapdragon X Elite NPU and Intel's upcoming next-generation Core Ultra Lunar Lake NPU. In terms of parameters, AMD's Ruilong AI300 series NPU computing power is the strongest in the industry.

According to the technical specifications of the Ruilong AI300 series released by AMD, in terms of video editing, multitasking processing and 3D graphics computing performance, Ruilong AI 9 HX 370 outperforms Qualcomm Snapdragon X Elite, with performance improvements of 40%, 47% and 73%, respectively; compared with the current Intel's leading Core Ultra 9 185H, the average performance has been improved by 36%; compared with Apple M3, the graphics processing performance has been improved by up to 98%.

It is estimated that AI PCs (laptops) equipped with this XPU will be rapidly and massively launched, such as ASUS, Dell, HP, Lenovo, MSI, Acer, etc. More than 100 new products will be launched in July one after another.

Su Zifeng also brought the Zen5 Ryzen 9000 series desktop processors (Granite Ridge), based on the Zen5 architecture, and the first batch of products will be launched at the end of July 2024.

Simply put, the Ryzen 9000 series is the third series of the AM5 slot after the Ryzen 7000 "Raphael" and Ryzen 8000 "Hawk Point" series, equipped with two small Zen5 chips with up to 8 cores, up to 16 cores and 32 threads.

According to official test data from AMD, the IPC performance of Zen 5 core for PC platform is about 16% higher than that of Zen 4 on average. Compared with Intel Core i9-14900K, Ryzen 9 9950X is 4% to 23% faster in game performance tests; the productivity performance of Ryzen 9950X is 7% to 56% faster than that of Intel Core i9-14900K.

Editor/Emily

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment