Track the latest AI trends

AMD's Su Zifeng: Boldly laughing at the industry's lack of men!

wallstreetcn · Jun 8 18:58

作者：周源/华尔街见闻

当今最强AI PC（笔电）芯片，AMD造。

如果说中国的三国时代，本质上是一堆亲戚的内战，那么当今的AI芯片公司，AMD最高领导人苏姿丰与英伟达的皮衣教主黄仁勋，也很有些亲戚干仗的意味：武器是苏姿丰带来的Ryzen（锐龙）9000系列CPU、AI PC芯片“锐龙AI 300系列”、数据中心芯片和GPU。

在英伟达CEO黄仁勋于6月2日的COMPUTEX 2024技术大会发表AI主题相关的演讲、并公布GPU和互连路线图不到15个小时，AMD CEO苏姿丰（Lisa Su）于6月3日在同一场合更新了AMD公司的AI加速卡Instinct GPU系列路线图。

苏姿丰用一大批即期和中远期产品展现了AMD在AI领域的雄心和对未来技术发展的承诺：今年四季度，AMD会推出全新AI加速芯片Instinct MI325X，2025年是MI350，2026年推出MI400。

简要而言，作为现有MI300系列的升级版，Instinct MI325X AI加速卡采用CDNA 3架构。这款加速卡将配备高达288GB的HBM3E内存和6TB/s的内存带宽，提供1.3PFLOPs的FP16和2.6PFLOPs的FP8计算性能，能够处理高达1万亿参数的服务器。

苏姿丰表示，MI325X的AI性能提升幅度为AMD史上最大，相较竞品英伟达H200将有1.3倍以上的提升，故而更具性价比优势。

根据AMD Instinct GPU系列路线图，计划在2025年推出的MI350系列，将基于下一代CDNA 4架构，并与OAM（Optimized Accelerated Matrix）兼容。MI350系列将基于3nm工艺技术，提供与MI325X 一样的高达288GB的HBM3E内存，支持FP4/FP6数据类型。

MI400系列，预计在2026年推出，基于全新CDNA Next架构。性能方面，CDNA 3架构预计将比CDNA 2提高8倍，而CDNA 4架构预计将比CDNA 3提供大约35倍的AI推理性能提升。AMD没有披露CDNA Next架构的性能对比参数。

最强AI PC芯片：强在哪里

与上述“中远期”产品相比，AMD也推出“即时”AI加速卡——代号为“Strix Point”的第三代AI PC芯片“锐龙AI 300系列”和AMD “Ryzen 9000系列”桌面处理器。

其中“锐龙AI 300系列”性能之强悍，让苏姿丰拥有傲视群雄的底气：NPU算力高达50TOPS，超过高通骁龙X Elite的45TOPS和英特尔Lunar Lake的40-45TOPS。正所谓：四十万人齐卸甲，更无一个是男儿。不过这三个公司的CPU算力都达到或超越了微软AI PC对NPU的算力要求（40TOPS+）。

AMD锐龙AI 300系列，最初的版本是在2023年推出的锐龙7040系列（代号Pheonix）。这是全球首款集成独立NPU AI引擎的x86处理器，基于当时全新设计的XDNA架构，算力约10TOPS，综合CPU和GPU后，整体算力约为33TOPS，一举奠定AI PC新品类的算力起点。

同年底，也就是2023年底，AMD推出锐龙7040系列的迭代版——代号为“Hawk Point”的锐龙8040系列，NPU算力提升60%至16TOPS，整体算力也提升到了39TOPS。

这次推出的锐龙AI300系列，为AMD第三代AI芯片：采用全新的Zen5 PU架构，GPU内核升级为RDNA3.5架构，NPU更新为XDNA2架构，号称是“面向下代AI PC/Copilot+ PC的世界一流处理器”。

目前，AI PC作为一种新品类，无论是上游的芯片，还是下游的终端，切口都从高端产品开始。

根据苏姿丰披露的信息，锐龙AI300系列首发两款型号——锐龙AI 9 HX 370和锐龙AI 9 HX 365，全都定位于高端市场。其中，前者是高端中的高端，属于顶级旗舰。

锐龙AI 9 HX 370的CPU主频高达5.1GHz，为12核心24线程，相比锐龙8040系列，其CPU核心数量增加至少30%，这是多年来首次；二级缓存总容量增至12MB（1MB/核），三级缓存增至前所未有的24MB，此前最高为16MB。

GPU部分，锐龙AI 9 HX 370升级了相应的技术架构，CU单元数量从12个增至16个，命名为“Radeon 890M”；NPU算力提升至50TOPS，与锐龙8040系列的NPU算力16TOPS相比，增加3倍多。

锐龙AI 9 365除了主频是5.0GHz，10核20线程，二级缓存10MB，其他参数与锐龙AI 9 HX 370一致。

XDNA2：首发BF16浮点精度格式

锐龙AI300系列的NPU架构采用的是“面向下代AI PC/Copilot+ PC”的XDNA2架构。

据苏姿丰现场披露的技术结构图，与初代XDNA架构相比，XDNA2架构的结构基本不变，但规模扩大：前者的AI计算引擎模块叫做“AIE Tile”，数量是20个；到了新一代架构，名称变成“AI Tile”，数量增加至32个。本地内存模块，从初代的5个增加到8个。

此外，用于互连的交叉总线也从普通的Data Fabric，升级为Zen/RDNA家族的Infinity Fabric，带来了更大的传输带宽和更高的数据传输效率。

据AMD官方给出的数据：XDNA2 NPU算力提升多达5倍（Llama 2 70亿参数大模型的响应速度，从启动到获得第一个token），多任务并行能力翻番，能效也提升了最多两倍。

XDNA2架构还有个技术亮点：引入全新Block FP16（也称BFloat16或BF16）浮点精度格式，这在NPU上是首次。此前，BF16格式一般在CPU和GPU上应用。

从性能上看，FP8浮点格式性能强，但精度不足；FP16浮点格式则相反，精度高但性能略逊。现在，BF16格式兼具两者之优，符合目前大多数AI应用的16位精度要求，无需再做额外转换。

目前，锐龙AI300系列的综合算力高达50TOPS，超过高通骁龙X Elite NPU和Intel即将发布的下一代酷睿Ultra Lunar Lake NPU综合45TOPS算力等级。就参数而言，AMD的锐龙AI300系列NPU算力为当前业界最强。

据AMD发布的锐龙AI300系列技术参数，在视频编辑、多任务处理和3D图形计算性能方面，锐龙AI 9 HX 370比高通骁龙X Elite，分别提升40%、47%和73%；若与当前Intel当家的酷睿Ultra 9 185H相比，平均性能提升36%；对比苹果M3，图形处理性能提升更高达98%。

目测搭载该款XPU的AI PC（笔记本电脑）将快速大量上市，比如华硕、戴尔、惠普、联想、微星和宏基等，已有100多款新品将从7月陆续上市。

苏姿丰还带来了Zen5 Ryzen 9000系列桌面处理器（Granite Ridge），基于Zen5构架，首批产品将于2024年7月底推出。

简单看看，Ryzen 9000系列是继Ryzen 7000“Raphael”和Ryzen 8000“Hawk Point”系列之后，AM5插槽的第三个系列，配备两颗最多有8个核心，最高16个内核和具备32线程的Zen5小芯片。

据AMD官方测试数据，Zen 5内核面向PC平台的IPC性能相比Zen 4平均提升约16%。与Intel Core i9-14900K相比，Ryzen 9 9950X在游戏性能测试中的速度快4%-23%；Ryzen 9950X的生产力性能，比Intel Core i9-14900K快7%-56%。

编辑/emily

Author: Zhou Yuan / Wall Street See

The strongest AI PC (laptop) chip today is made by AMD.

If the Three Kingdoms period of China is essentially a civil war among a group of relatives, then today's AI chip companies, AMD's highest leader, Su Zifeng, and NVIDIA's leather-clad leader, Huang Renxun, also have some meaning of family fighting: the weapons are Ryzen 9000 series CPUs and AI PC chips brought by Su Zifeng, the "Ryzen AI 300 series" and data center chips and GPUs.

Within 15 hours after NVIDIA CEO Huang Renxun delivered a speech on AI-related topics and announced GPU and interconnect roadmaps at COMPUTEX 2024, AMD CEO Su Zifeng (Lisa Su) updated AMD's AI acceleration card Instinct GPU series roadmap at the same occasion on June 3.

Su Zifeng used a large number of spot and medium-term products to show AMD's ambition in the field of AI and commitment to future technology development: in the fourth quarter of this year, AMD will launch a new AI acceleration chip Instinct MI325X, followed by MI350 in 2025 and MI400 in 2026.

Simply put, as an upgraded version of the existing MI300 series, the Instinct MI325X AI accelerator card adopts CDNA 3 architecture. This accelerator card will be equipped with up to 288GB of HBM3E memory and 6TB/s of memory bandwidth, providing 1.3PFLOPs of FP16 and 2.6PFLOPs of FP8 computing performance, capable of handling servers with up to 1 trillion parameters.

Su Zifeng said that the AI performance improvement of MI325X is the largest in AMD history, and it will have more than 1.3 times the improvement compared with the competing NVIDIA H200, making it more cost-effective.

According to the AMD Instinct GPU series roadmap, the planned MI350 series to be released in 2025 will be based on the next-generation CDNA 4 architecture and be compatible with OAM (Optimized Accelerated Matrix). The MI350 series will be based on 3nm process technology, providing up to 288GB of HBM3E memory, and supporting FP4/FP6 data types as the MI325X.

The upcoming MI400 series is expected to be launched in 2026 based on the new CDNA Next architecture. In terms of performance, CDNA 3 architecture is expected to be 8 times higher than CDNA 2, and CDNA 4 architecture is expected to provide about 35 times AI inference performance improvement over CDNA 3. AMD did not disclose the performance comparison parameters of CDNA Next architecture.

The most powerful AI PC chip: Where is its strength?

In addition to the above "medium and long-term" products, AMD also launched" real-time" AI acceleration card-the third-generation AI PC chip "Ryzen AI 300 series" with the code name "Strix Point" and the AMD "Ryzen 9000 series" desktop processor.

Among them, the strength of the "Ryzen AI 300 series" allows Su Zifeng to have the confidence to overlook all other competitors: NPU computing power is as high as 50TOPS, surpassing Qualcomm Snapdragon X Elite's 45TOPS and Intel Lunar Lake's 40-45 TOPS. It is said, "Forty thousand people disarmed, and not one is a man." However, the CPU computing power of these three companies has reached or exceeded the AI PC's NPU computing power requirement of Microsoft (40TOPS+).

The initial version of the AMD Ryzen AI 300 series was the Ryzen 7040 series (code-named Phoenix) launched in 2023. This was the world's first x86 processor with an integrated independent NPU AI engine, based on the newly designed XDNA architecture at that time, with a computing power of about 10TOPS. After integrated CPU and GPU, the overall computing power was about 33TOPS, which established the computing power starting point for the new AI PC category.

At the end of the same year, that is, at the end of 2023, AMD launched an iterative version of the Ryzen 7040 series - the Ryzen 8040 series with the code name "Hawk Point", which increased the NPU computing power by 60% to 16TOPS, and the overall computing power also increased to 39TOPS.

The Ryzen AI 300 series launched this time is AMD's third-generation AI chip: it adopts the new Zen5 PU architecture, the GPU core is upgraded to the RDNA3.5 architecture, and the NPU is updated to the XDNA2 architecture, claiming to be the "world-class processor for next-generation AI PC/Copilot+ PC".

At present, as a new category of products, AI PCs start from high-end products whether they are upstream chips or downstream terminals.

According to the information disclosed by Su Zifeng, the Ryzen AI 300 series will launch two models as the first wave - Ryzen AI 9 HX 370 and Ryzen AI 9 HX 365, both positioned in the high-end market. Among them, the former is the top flagship of the high-end market.

The CPU frequency of Ryzen AI 9 HX 370 is as high as 5.1GHz, with 12 cores and 24 threads. Compared with the Ryzen 8040 series, its CPU core count is increased by at least 30%, which is the first time in many years. The total capacity of the secondary cache is increased to 12MB (1MB/core), and the tertiary cache is increased to an unprecedented 24MB, which was previously a maximum of 16MB.

For the GPU part, Ryzen AI 9 HX 370 has upgraded corresponding technology architecture. The number of CU cores has been increased from 12 to 16, named as "Radeon 890M". The NPU computing power has been increased to 50TOPS, which is more than three times than the 16TOPS NPU of Ryzen 8040 series.

Ryzen AI 9 365 has a main frequency of 5.0GHz, 10 cores and 20 threads, 10MB of secondary cache, and other parameters are the same as Ryzen AI 9 HX 370.

XDNA2: The first BF16 floating point accuracy format.

The NPU architecture of the Ryzen AI 300 series adopts the XDNA2 architecture that is "oriented to next-generation AI PC/Copilot+ PC".

According to the technology structure diagram disclosed by Su Zifeng on site, compared with the first-generation XDNA architecture, the structure of XDNA2 architecture is basically unchanged, but the scale has been expanded: the AI computing engine module of the former is called "AIE Tile", with a quantity of 20; in the new generation architecture, the name is changed to "AI Tile", with a quantity increased to 32. The local memory module has increased from 5 in the first generation to 8.

In addition, the cross bus used for interconnection has also been upgraded from the ordinary Data Fabric to the Infinity Fabric of Zen/RDNA family, bringing greater transmission bandwidth and higher data transmission efficiency.

According to official data from AMD, XDNA2 NPU computing power has been increased by as much as 5 times (response speed of Llama 2 7 billion parameter large model, from startup to obtaining the first token), multitasking parallel capability has doubled, and energy efficiency has also increased by up to two times.

XDNA2 architecture also has a technical highlight: the introduction of a brand-new Block FP16 (also known as BFloat16 or BF16) floating-point precision format, which is the first in NPU. Previously, BF16 format was generally used on CPUs and GPUs.

In terms of performance, the FP8 floating-point format has strong performance, but insufficient precision; the opposite is true for FP16 floating-point format, with high precision but slightly lower performance. Now, the BF16 format combines the advantages of both, meeting the 16-bit precision requirements of most AI applications without the need for additional conversion.

Currently, the comprehensive computing power of the Ruilong AI300 series is as high as 50TOPS, surpassing the comprehensive 45TOPS computing power level of Qualcomm Snapdragon X Elite NPU and Intel's upcoming next-generation Core Ultra Lunar Lake NPU. In terms of parameters, AMD's Ruilong AI300 series NPU computing power is the strongest in the industry.

According to the technical specifications of the Ruilong AI300 series released by AMD, in terms of video editing, multitasking processing and 3D graphics computing performance, Ruilong AI 9 HX 370 outperforms Qualcomm Snapdragon X Elite, with performance improvements of 40%, 47% and 73%, respectively; compared with the current Intel's leading Core Ultra 9 185H, the average performance has been improved by 36%; compared with Apple M3, the graphics processing performance has been improved by up to 98%.

It is estimated that AI PCs (laptops) equipped with this XPU will be rapidly and massively launched, such as ASUS, Dell, HP, Lenovo, MSI, Acer, etc. More than 100 new products will be launched in July one after another.

Su Zifeng also brought the Zen5 Ryzen 9000 series desktop processors (Granite Ridge), based on the Zen5 architecture, and the first batch of products will be launched at the end of July 2024.

Simply put, the Ryzen 9000 series is the third series of the AM5 slot after the Ryzen 7000 "Raphael" and Ryzen 8000 "Hawk Point" series, equipped with two small Zen5 chips with up to 8 cores, up to 16 cores and 32 threads.

According to official test data from AMD, the IPC performance of Zen 5 core for PC platform is about 16% higher than that of Zen 4 on average. Compared with Intel Core i9-14900K, Ryzen 9 9950X is 4% to 23% faster in game performance tests; the productivity performance of Ryzen 9950X is 7% to 56% faster than that of Intel Core i9-14900K.

Editor/Emily

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

Track the latest AI trends

AMD苏姿丰：敢笑业界无男儿！

AMD's Su Zifeng: Boldly laughing at the industry's lack of men!

最强AI PC芯片：强在哪里

XDNA2：首发BF16浮点精度格式

The most powerful AI PC chip: Where is its strength?

XDNA2: The first BF16 floating point accuracy format.

Risk Disclaimer

Statement