Focus on CHK and US stocks

In-depth article | A video card that is more expensive than gold! Just how profitable is the Nvidia H100?

硅基研習社 · Sep 9, 2023 11:46

来源：硅基研习社
作者：王一川

当下一份H100的供货承诺，堪比房地产黄金时代的一纸土地批文。

2023年8月3日，华尔街和硅谷联袂奉上了一件震撼业界的大事：让一家创业公司拿到23亿美元的债务融资，抵押物则是当前全球最硬的通货——H100显卡。

这个大事件的主角叫做CoreWeave，主营业务是AI私有云服务，简单说就是通过搭建拥有大量GPU算力的数据中心，来给AI创业公司和大型商业客户提供算力基础设施。CoreWeave累计融资5.8亿美金，目前是B轮，估值20亿美元。

CoreWeave成立于2016年，创始人是三个华尔街大宗商品交易员。刚开始公司的主营业务只有一个：挖矿，采购大量GPU来组建矿机中心，尤其是在币圈低潮时，公司会逆周期囤大量显卡，也因此跟英伟达建立了铁杆的革命友谊。

2019年，CoreWeave开始把这些矿机改造成企业级数据中心，向客户提供AI云服务，刚开始的生意也不温不火，但ChatGPT诞生之后，大模型的训练和推理每天都在消耗大量算力，已经拥有数万张显卡（当然未必是最新型号）的CoreWeave嗖的一下起飞，门口挤满了客户和风投。

但令人感到蹊跷的是：CoreWeave累计一共只融到了5.8亿美金，账面GPU的净值不会超过10亿美元，甚至公司整体估值也只有20亿美元，但为何却能通过抵押借到23亿美元呢？一向精于算计、热衷对抵押物价值膝盖斩的华尔街，为何如此慷慨呢？

原因极有可能是：CoreWeave虽然账上还没这么多显卡，但它拿到了英伟达的供货承诺，尤其是H100。

CoreWeave跟英伟达的铁杆关系已经是硅谷公开的秘密。这种铁杆根源于CoreWeave对英伟达的毫无二心的忠诚和支持——只用英伟达的卡、坚决不自己造芯、显卡卖不动时帮英伟达囤卡。对黄仁勋来说，这种关系的含金量，远超跟微软、Google和特斯拉的那些塑料友情。

因此，尽管英伟达H100十分紧缺，英伟达还是把大量新卡分配给了CoreWeave，甚至不惜限制对亚马逊和谷歌等大厂的供应。黄仁勋在电话会议里夸赞：“一批新的GPU云服务提供商会崛起，其中最著名的是 CoreWeave，他们做得非常好。”

而在喜提23亿美金的一周前，CoreWeave就已对外宣称，将耗资16亿美元在德州建立一个占地面积42,000 平方米的数据中心。仅凭借跟英伟达之间的关系和优先配货权，CoreWeave就可以把建数据中心的钱从银行里借出来——这种模式，让人想起了拿地后立马找银行贷款的地产商。

所以可以这样说：当下一份H100的供货承诺，堪比房地产黄金时代的一纸土地批文。

一卡难求的H100

今年4月在接受采访时，马斯克抱怨道[2]：“现在似乎连狗都在买GPU。”

很讽刺的是，特斯拉早在2021年就发布了自研的D1芯片，由台积电代工，采用7nm工艺，号称能替代当时英伟达主流的A100。但2年过去了，英伟达推出了更为强大的H100，而特斯拉的D1没有后续迭代，因此当马斯克试图组建自家的人工智能公司时，还是得乖乖地跪在黄老爷门前求卡。

H100在去年9月20日正式推出，由台积电4N工艺代工。相较于前任A100，H100单卡在推理速度上提升3.5倍，在训练速度上提升2.3倍；如果用服务器集群运算的方式，训练速度更是能提高到9倍，原本一个星期的工作量，现在只需要20个小时。

相比A100，H100的单卡价格更贵，大约是A100的1.5～2倍左右，但训练大模型的效率却提升了200%，这样这算下来的“单美元性能”更高。如果搭配英伟达最新的高速连接系统方案，每美元的GPU性能可能要高出 4-5 倍，因此受到客户疯狂追捧。

抢购H100的客户，主要分成三类：

第一类是综合型云计算巨头，比如微软Azure、谷歌GCP和亚马逊AWS这样的云计算巨头。他们的特点是财大气粗，动辄就想“包圆”英伟达的产能，但每家也都藏着小心思，对英伟达的近垄断地位感到不满，暗地里自己研发芯片来降低成本。

第二类是独立的云GPU服务商，典型公司如前文提到的CoreWeave，以及Lambda、RunPod等。这类公司算力规模相对较小，但能够提供差异化的服务，而英伟达对这类公司也是大力扶持，甚至直接出钱投资了CoreWeave和Lambda，目的很明确：给那些私自造芯的巨头们上眼药。

第三类是自己在训练LLM（大语言模型）的大小公司。既包括Anthropic、Inflection、Midjourney这种初创公司，也有像苹果、特斯拉、Meta这样的科技巨头。它们通常一边使用外部云服务商的算力，一边自己采购GPU来自建炉灶——有钱的多买，没钱的少买，主打一个丰俭由人。

在这三类客户中，微软Azure至少有5万张H100，谷歌手上大概有3万张，Oracle大概有2万张左右，而特斯拉和亚马逊手上也至少拿有1万张左右，CoreWeave据称有3.5万张的额度承诺（实际到货大概1万）。其他的公司很少有超过1万张的。

这三类客户总共需要多少张H100呢？根据海外机构GPU Utils的预测，H100当前需求大概43.2万张。其中OpenAI需要5万张来训练GPT-5，Inflection需求2.2万张，Meta则是2.5万张（也有说法是10万张），四大公有云厂商每家都需要至少3万张，私有云行业则是10万张，而其他的小模型厂商也有10万张的需求[3]。

英伟达2023年的H100出货量大概在50万张左右，目前台积电的产能仍在爬坡，到年底H100一卡难求的困境便会缓解。

但长期来看，H100的供需缺口会随着AIGC的应用爆发而继续水涨船高。根据金融时报的报道，2024年H100的出货量将高达150万张-200万张，相比于今年的50万张，提升3-4倍[4]。

而华尔街的预测则更为激进：美国投行Piper Sandler认为明年英伟达在数据中心上的营收将超过600亿美元（FY24Q2：103.2亿美元），按这个数据倒推，A+H卡的出货量接近300万张。

还有更夸张的估计。某H100服务器最大的代工厂（市占率70%-80%），从今年6月开始就陆续出货了H100的服务器，7月份产能陆续爬坡。一份最近的调研显示，这家代工厂认为2024年A+H卡的出货量会在450万张～500万张之间。

这对英伟达意味着“泼天的富贵”，因为H100的暴利程度，是其他行业人难以想象的。

比黄金更贵的显卡

为了搞清H100有多暴利，我们不妨把它的物料成本（Bill of Materials, BOM）彻底拆解出来。

如图所示，H100最通用的版本H100 SXM采用的是台积电CoWoS的7晶粒封装，6颗16G的HBM3芯片分列两排紧紧围绕着中间的逻辑芯片。

而这也构成了H100最重要的三个部分：逻辑芯片、HBM存储芯片、CoWoS封装，除此之外，还有诸如PCB板以及其他的一些辅助器件，但价值量不高。

核心的逻辑芯片尺寸是814mm^2，产自台积电最先进的台南18号工厂，使用的工艺节点则是“4N”，虽然名字上是4打头，但实际上是5nm+。由于5nm的下游，手机等领域的景气度不佳，因此台积电在保供逻辑芯片上没有任何问题。

而这块逻辑芯片是由12寸（面积70,695mm^2）的晶圆切割产生，理想状态下可以切出86块，但考虑到“4N”线80%的良率以及切割损耗，最后一张12寸晶圆只能切出65块的核心逻辑芯片。

这一块核心逻辑芯片的成本是多少呢？台积电2023年一片12寸的晶圆对外报价是13,400美元，所以折算下来单块大概在200美元左右。

接下来是6颗HBM3芯片，目前由SK海力士独供，这家起源于现代电子的企业，2002年几乎要委身与美光，凭借着政府的输血以及逆周期上产能的战略，如今在HBM的量产技术上至少领先美光3年（美光卡在HBM2e，海力士2020年中期量产)。

HBM的具体价格，各家都讳莫如深，但根据韩媒的说法，HBM目前是现有DRAM产品的5-6倍。而现有的GDDR6 VRAM的价格大概是每GB3美元，如此推算HBM的价格是在每GB 15美元左右。那一张H100 SXM在HBM上的花费就是1500美元。

虽然今年HBM的价格不断上涨，英伟达、Meta的高管也亲赴海力士“督工”，可下半年三星的HBM3就能逐步量产出货，再加上韩国双雄祖传的扩张血脉，想必到了明年HBM就不再是瓶颈。

而真正是瓶颈的则是台积电的CoWoS封装，这是一种2.5D的封装工艺。相比于直接在芯片上打孔（TSV）、布线（RDL）的3D封装，CoWoS可以提供更好的成本、散热以及吞吐带宽，前两者对应HBM，后两者则是GPU的关键。

所以想要高存力、高算力的芯片，CoWoS就是封装上的唯一解。英伟达、AMD两家的四款GPU都用上了CoWoS就是最好的佐证。

CoWoS的成本是多少呢？台积电22年财报披露了CoWoS工艺占总营收7%，于是海外分析师Robert Castellano根据产能，以及裸晶的尺寸推算出封装一块AI芯片能给台积电带来723美元的营收[6]。

因此把上述最大的三块成本项加总，合计在2,500美元左右，其中台积电占了$1,000（逻辑芯片+CoWoS）左右，SK海力士占了1500美金（未来三星肯定会染指），再算上PCB等其他材料，整体物料成本不超过3000美金。

那H100卖多少钱呢？35000美金，直接加了一个零，毛利率超过90%。过去10年英伟达毛利率大概在60%上下，现在受高毛利的A100/A800/H100的拉动，今年Q2英伟达的毛利率已经站上了70%。

这有点反常识：英伟达严重依赖台积电的代工，后者地位无人撼动，甚至是唯一能卡英伟达脖子的核心环节。但这么一块3.5万美金的卡，制造它的台积电只能拿1000美金，而且只是收入，不是利润。

不过，用毛利率来定义暴利，对于芯片公司意义不大，要是从沙子开始算，那毛利率更高。一张4N工艺的12寸晶圆，台积电卖给谁都差不多是1.5万美金一片，英伟达能加个零卖给客户，自然有其诀窍。

这个诀窍的秘密在于：英伟达本质上，是一个伪装成硬件厂商的软件公司。

软硬一体的护城河

英伟达最强大的武器，就藏在毛利率减去净利率的那一部分。

在本轮AI热潮之前，英伟达的毛利率常年维持在65%上下，而净利率通常只有30%。而今年Q2受高毛利的A100/A800/H100的拉动，毛利率站上70%，净利率更是高达45.81%。

英伟达目前在全球有超过2万名员工，大都是高薪的软硬件工程师，而根据美国猎聘Glassdoor的数据，这些岗位的平均年薪基本都高于20万美元/年。

在过去的十年里，英伟达研发支出的绝对值保持着高速增长，而研发费用率稳态下也维持在20%以上。当然，如果某一年的终端需求爆发，比如2017年的深度学习、21年的挖矿、以及今年的大语言模型，营收的分母骤然抬升，研发费用率就会短暂的跌倒20%，相应地利润也会非线性暴增。

而在英伟达研发的这么多项目中最关键的无疑是CUDA。

03年为解决DirectX编程门槛过高的问题，Ian Buck的团队推出了一款名为Brook的编程模型，这也是后来人们常说的CUDA的雏形。06年Buck加入英伟达，并说服黄仁勋研发CUDA[8]。

因为支持C语言环境下的并行计算，使得CUDA一跃成为工程师的首选，也让GPU走上了通用处理器（GPGPU）的道路。

在CUDA逐渐成熟之后，Buck再次劝说黄仁勋，让英伟达未来所有的GPU都必须支持CUDA。06年CUDA立项，07年推出产品，当时英伟达的年营收仅有30亿美元，却在CUDA上花费5亿美金，到了17年时，单在CUDA上的研发支出就已超过了百亿。

曾经有位私有云公司的CEO在接受采访时说过，他们也不是没想过转去买AMD的卡，但要把这些卡调试到正常运转至少需要两个月的时间[3]。而为了缩短这两个月，英伟达投入上百亿走了20年。

芯片行业浮沉大半个世纪，从来没有一家企业像英伟达一样，既卖硬件、也卖生态，或者按黄仁勋的话来说：“卖的是准系统”。因此，英伟达对标的也的确不是芯片领域的那些先贤们，而是苹果——另一家卖系统的公司。

从07年推出CUDA，到成为全球最大的印钞厂，英伟达也并不是没有过对手。

08年当时芯片届王者英特尔中断了与英伟达在集显项目上的合作，推出自己的通用处理器（GPCPU），打算在PC 领域“划江而治”。可英伟达在随后几年的产品迭代中，硬是把自家处理器推广到太空、金融、生物医疗等需要更强大计算能力的领域，于是10年英特尔眼看打压无望，被迫取消了独立显卡计划。

09年苹果的开发团队推出了OpenCL，希望能凭借着通用性在CUDA身上分一杯羹。但OpenCL在深度学习的生态上远不如CUDA，许多学习框架要么是在CUDA发布之后，才会去支持OpenCL，要么压根不支持OpenCL。于是在深度学习上的掉队，使得OpenCL始终无法触及更高附加值的业务。

15年AlphaGo开始在围棋领域初露锋芒，宣告人工智能的时代已经来临。此时的英特尔为了赶上这最后一班车，把AMD的GPU装入自己的系统芯片内。这可是两家公司自上世纪80年代以来的首次合作。可如今CPU老大、老二+GPU老二的市值之和仅是GPU老大英伟达的1/4。

从目前看来，英伟达的护城河几乎是牢不可摧。即使有不少大客户笑里藏刀，私下里在研发自己的GPU，但凭借着庞大的生态和快速的迭代，这些大客户也无法撬动帝国的裂缝，特斯拉就是明证。英伟达的印钞机生意，在可见的未来还会持续。

可能唯一让黄仁勋萦绕乌云的地方，便是那个客户众多、需求旺盛但H100卖不进去、但人家又在咬牙攻坚的地方——这个地方全世界只有一个。

参考资料

[1] Crunchbase

[2] 'Everyone and Their Dog is Buying GPUs,' Musk Says as AI Startup Details Emerge-tom's HARDWARE

[3] Nvidia H100 GPUs: Supply and Demand-GPU Utils

[4] Supply chain shortages delay tech sector’s AI bonanza，FT

[5] AI Capacity Constraints - CoWoS and HBM Supply Chain-DYLAN PATEL, MYRON XIE, AND GERALD WONG，Semianalysis

[6] Taiwan Semiconductor: Significantly Undervalued As Chip And Package Supplier To Nvidia-Robert Castellano，Seeking Alpha

[7] 芯片战争，余盛

[8] What is CUDA? Parallel programming for GPUs-Martin Heller，InfoWorld

[9] NVIDIA DGX H100 User Guide

编辑/Somer

Source: Silicon Labs
Author: Wang Yichuan

The current H100 supply promise is comparable to a land approval document in the golden age of real estate.

On August 3, 2023, Wall Street and Silicon Valley jointly presented a major event that shocked the industry: getting a startup to obtain $2.3 billion in debt financing, and collateral is currently the hardest currency in the world --H100 video card.

The protagonist of this major event is called CoreWeave. Its main business is AI private cloud services. Simply put, it provides computing power infrastructure to AI startups and large-scale commercial customers by building data centers with a large amount of GPU computing power. CoreWeave has raised a total of 580 million US dollars. Currently, it is round B, with a valuation of 2 billion US dollars.

CoreWeave was founded in 2016 by three Wall Street commodity traders. At the beginning, the company only had one main business: mining, purchasing a large number of GPUs to set up a mining machine center.Especially when the coin industry is at a low point, the company will stock up on a large number of graphics cards countercyclically, and as a result, it has established a strong revolutionary friendship with Nvidia.

In 2019, CoreWeave began to transform these mining machines into enterprise-grade data centers to provide customers with AI cloud services. The initial business was tepid, but after the birth of ChatGPT, training and reasoning for big models consumed a lot of computing power every day. CoreWeave, which already has tens of thousands of graphics cards (not necessarily the latest model, of course), took off, and the door was full of customers and venture capital.

What is surprising, however, is that CoreWeave has only consolidated a total of 580 million US dollars, the net value of the book GPU will not exceed 1 billion US dollars, and even the company's overall valuation is only 2 billion US dollars, but why can it borrow 2.3 billion US dollars through collateral? Why is Wall Street, which has always been good at calculating and passionate about the value of collateral, so generous?

The reason is most likely: CoreWeave doesn't have that many video cards on its account, but it has received Nvidia's supply promises, especially the H100.

CoreWeave's hardcore relationship with Nvidia is already an open secret in Silicon Valley.This kind of hardcore roots stems from CoreWeave's unwavering loyalty and support for Nvidia — using only Nvidia cards, determined not to build its own core, and helping Nvidia stock up on cards when the graphics card doesn't sell well.For Hwang In-hoon, the richness of this relationship far exceeds that of plastic friendships with Microsoft, Google, and Tesla.

As a result, even though the Nvidia H100 is in short supply, Nvidia has distributed a large number of new cards to CoreWeave, even limiting supply to major manufacturers such as Amazon and Google. Wong In-hoon praised during the conference call: “A number of new GPU cloud service providers will rise, the most famous of which is CoreWeave. They have done a great job.”

And just a week before it raised 2.3 billion US dollars, CoreWeave had already announced that it would spend 1.6 billion US dollars to build a 42,000 square meter data center in Texas. With its relationship with Nvidia and priority distribution rights alone, CoreWeave can borrow money from banks to build data centers — a model reminiscent of real estate owners seeking bank loans immediately after acquiring land.

So it can be said this way: the next H100 supply promise is comparable to a land approval document in the golden age of real estate.

One H100 card is hard to find

In an interview in April of this year, Musk complained [2]:“It seems like even dogs are buying GPUs now.”

Ironically, Tesla released its self-developed D1 chip as early as 2021. It is manufactured by TSMC and uses a 7nm process, which claims to be able to replace Nvidia's mainstream A100 at the time. However, after 2 years, Nvidia launched the more powerful H100, and Tesla's D1 did not have subsequent iterations, so when Musk tried to set up his own artificial intelligence company, he still had to kneel down in front of Grandpa Huang to ask for cards.

The H100 was officially launched on September 20 last year, and was manufactured by TSMC's 4N process. Compared with the previous A100, the H100 single card is 3.5 times faster in inference speed and 2.3 times faster in training speed; if server cluster computing is used, the training speed can be increased 9 times. Originally, the weekly workload is now only 20 hours.

Compared to the A100, the H100's single card is more expensive, about 1.5 to 2 times that of the A100, but the efficiency of training large models has increased by 200%, so the “performance per dollar” calculated is higher.If used with Nvidia's latest high-speed connectivity system solution, the GPU performance per dollar may be 4-5 times higher, so it is insanely sought after by customers.

Customers who snapped up the H100 are mainly divided into three categories:

The first category isA comprehensive cloud computing giant, such as cloud computing giants such as Microsoft Azure, Google GCP, and Amazon AWS. Their characteristic is that they have a lot of money, and they always want to “spend money” on Nvidia's production capacity, yet they are also careful. They are dissatisfied with Nvidia's near-monopoly position, and secretly develop their own chips to reduce costs.

The second category isIndependent cloud GPU service provider, typical companies such as CoreWeave mentioned earlier, as well as Lambda, RunPod, etc. These companies have relatively small computing power, but they can provide differentiated services, and Nvidia also strongly supports such companies, and has even directly invested in CoreWeave and Lambda. The purpose is very clear: to give eye medicine to giants that build their own cores.

The third category isCompanies big and small that are training LLM (Big Language Models) themselves.It includes startups such as Anthropic, Inflection, and Midjourney, as well as tech giants such as Apple, Tesla, and Meta. They usually use the computing power of an external cloud service provider while purchasing their own GPUs to build stoves—buy more if you have money, buy less if you don't have money, and focus on people who are thrifty and thrifty.

Among these three types of customers, Microsoft Azure has at least 50,000 H100 cards, Google Ger has about 30,000 copies, Oracle has about 20,000 copies, and Tesla and Amazon also have at least 10,000 copies. CoreWeave is said to have a quota commitment of 35,000 copies (about 10,000 copies actually arrived). Few other companies have more than 10,000 copies.

How many H100s do these three types of customers need in total? According to the forecast of overseas agency GPU Utils, the current demand for the H100 is about 432,000 copies.Among them, OpenAI needs 50,000 cards to train GPT-5, Inflection requires 22,000 copies, Meta requires 25,000 copies (there is also a saying that it is 100,000), each of the four major public cloud vendors needs at least 30,000 copies, the private cloud industry needs 100,000 copies, and other small model vendors also need 100,000 copies [3].

Nvidia's H100 shipments in 2023 will probably be around 500,000Currently, TSMC's production capacity is still climbing, and by the end of the year, the difficult situation of finding an H100 card will be alleviated.

However, in the long run, the supply and demand gap for H100 will continue to rise as the application of AIGC explodes. According to the Financial Times, H100 shipments in 2024 will reach 1.5 million to 2 million units, an increase of 3-4 times compared to 500,000 this year [4].

Wall Street's predictions, on the other hand, are more aggressive: US investment bank Piper Sandler believes that next year Nvidia's data center revenue will exceed 60 billion US dollars (FY24Q2:10.32 billion US dollars). Inversion from this data, shipments of A+H cards are close to 3 million.

There are more exaggerated estimates. One of the largest H100 server foundries (70%-80% market share) has been shipping H100 servers one after another since June of this year, and production capacity climbed one after another in July. According to a recent survey, the foundry believes that the number of A+H cards shipped in 2024 will be between 4.5 million and 5 million.

This means “extraordinary wealth” for Nvidia, because the level of profit of the H100 is unimaginable for people in other industries.

A video card that is more expensive than gold

To find out how profitable the H100 is, we might as well completely disassemble its Bill of Materials (BOM).

As shown in the figure, the most common version of the H100, the H100 SXM, uses TSMC's CoWOS 7-chip package. The 6 16G HBM3 chips are arranged in two rows tightly surrounding the logic chip in the middle.

And this also forms the three most important parts of the H100: logic chip, HBM memory chip, and CoOS packageIn addition to this, there are also auxiliary devices such as PCB boards and other auxiliary devices, but the value is not high.

The core logic chip size is 814mm^2. It is produced at TSMC's most advanced plant No. 18 in Tainan. The process node used is “4N”. Although the name starts at 4, it is actually 5nm+. Due to the downstream of 5nm, the boom in mobile phones and other fields is poor, so TSMC has no problems with insurance and supply logic chips.

However, this logic chip is produced by cutting a 12-inch (area 70,695mm^2) wafer. Under ideal conditions, 86 blocks can be cut, but considering 80% yield and cutting loss of the “4N” line, the last 12-inch wafer can only cut out 65 core logic chips.

How much does this core logic chip cost? TSMC's external price for a 12-inch wafer in 2023 is 13,400 US dollars, so after conversion, a single block is about 200 US dollars.

What's next6 HBM3 chips, currently exclusively sold by SK HynixThis company, which originated in modern electronics, almost had to commit to Micron in 2002. With the government's blood transfusion and countercyclical production capacity strategy, it is now at least 3 years ahead of Micron in terms of HBM mass production technology (Micron cards are in HBM2e, and Hynix is mass-produced by Hynix in mid-2020).

Everyone is secretive about the exact price of HBM, but according to Korean media, HBM is currently 5-6 times that of existing DRAM products.However, the price of the current GDDR6 VRAM is about 3 US dollars per GB; according to this estimate, the price of HBM is around 15 US dollars per GB. That H100 SXM costs on HBM is 1,500 dollars.

Although the price of HBM continues to rise this year, and Nvidia and Meta executives personally went to Hynix to “supervise”, Samsung's HBM3 will gradually be mass-produced and shipped in the second half of the year. Coupled with the expanding lineage of Korea's two great ancestors, it is certain that HBM will no longer be a bottleneck by next year.

The real bottleneck is TSMC's CoOS package, which is a 2.5D packaging process.Compared to 3D packages that are drilled directly into the chip (TSV) and wired (RDL), CoOS can provide better cost, heat dissipation, and throughput bandwidth. The first two correspond to HBM, while the latter two are the key to GPUs.

Therefore, if you want a chip with high storage power and high computing power, CoOS is the only solution in terms of packaging. The best proof is that all four GPUs from Nvidia and AMD use Cowos.

How much does CoWoS cost? TSMC's financial report for the year 22 revealed that the CoWoS process accounted for 7% of total revenue.So, based on production capacity and the size of the bare crystal, overseas analyst Robert Castellano estimated that packaging an AI chip could bring TSMC 723 dollars in revenue[6].

Therefore, adding up the three biggest cost items mentioned above, totaling about 2,500 US dollars, of which TSMC accounts for about $1,000 (logic chip+CoOS), SK Hynix accounts for 1,500 US dollars (Samsung will definitely be involved in the future), and count other materials such as PCBs.The overall material cost is not more than 3000 US dollars.

So how much does the H100 sell for? 35,000 US dollars, directly plus a zero, gross margin of more than 90%.Nvidia's gross margin was around 60% in the past 10 years. Now driven by the high gross profit of A100/A800/H100, Nvidia's gross margin has risen to 70% in Q2 this year.

This is a bit counterintuitive: Nvidia relies heavily on TSMC's OEM, whose status is unshakable, and is even the only core part of Nvidia's neck. However, with such a 35,000 US dollar card, TSMC can only get 1,000 US dollars to make it, and it's only revenue, not profit.

However, using gross margin to define huge profits is of little significance to chip companies. If we start with sand, then gross margin is higher. For a 12-inch wafer with a 4N process, TSMC sells it to almost anyone for about 15,000 US dollars each piece. Nvidia can add a retail sale to customers; naturally, it has a knack for it.

The secret to this trick is: Nvidia is essentially a software company masquerading as a hardware manufacturer.

A moat that integrates hard and soft

Nvidia's most powerful weapon is hidden in the gross margin minus net interest rate.

Prior to the current AI boom, Nvidia's gross margin remained around 65% all year round, while the net profit margin was usually only 30%. However, this year's Q2 was driven by high gross profit A100/A800/H100. The gross margin was 70%, and the net profit margin was as high as 45.81%.

Nvidia currently has more than 20,000 employees around the world, most of them highly paid software and hardware engineers. According to Glassdoor recruitment data in the US, the average annual salary for these positions is basically higher than 200,000 US dollars/year.

Over the past ten years, the absolute value of Nvidia's R&D expenditure has maintained rapid growth, while the R&D expenditure rate has remained stable at over 20%. Of course, if demand for terminals explodes in a certain year, such as deep learning in 2017, mining in '21, and big language models this year, the revenue denominator suddenly rises, the R&D expense rate will briefly fall by 20%, and profits will skyrocket nonlinearly accordingly.

And the most critical of all the projects Nvidia has developed is certainly CUDA.

In 2003, in order to solve the problem that the DirectX programming threshold was too high, Ian Buck's team introduced a programming model called Brook, which was also the prototype of CUDA, which people often referred to later. Buck joined Nvidia in 2006 and convinced Wong In-hoon to develop CUDA [8].

Because it supports parallel computing in the C language environment, CUDA has quickly become the engineer's first choice, and has also enabled GPUs to embark on the path of general-purpose processors (GPGPUs).

After CUDA gradually matured, Buck once again persuaded Huang Renxun so that all future Nvidia GPUs must support CUDA. The CUDA project was established in 2006, and the product was launched in 2007. At that time, Nvidia's annual revenue was only 3 billion US dollars, but it was spending 500 million US dollars on CUDA. By '17, R&D expenditure on CUDA alone had exceeded 10 billion dollars.

A CEO of a private cloud company once said in an interview that they haven't thought about switching to an AMD card, but it will take at least two months to get these cards up and running properly [3].And in order to shorten these two months, Nvidia has invested tens of millions of dollars for 20 years.

The chip industry has been in turmoil for more than half a century, and there has never been a company like Nvidia that sells both hardware and ecology, or in Hwang In-hoon's words, “sells quasi-systems.”Therefore, it is true that Nvidia's target is not the pioneers in the chip field, but Apple — another company that sells systems.

From launching CUDA in 2007 to becoming the world's largest banknote printer, Nvidia is not without rivals.

In 2008, Intel, the chip king at the time, interrupted its collaboration with Nvidia on integrated display projects and launched its own general-purpose processor (GPCPU), which aims to “lead the way” in the PC field. However, in the next few years of product iteration, Nvidia simply promoted its own processors to fields that needed more powerful computing power, such as space, finance, and biomedicine. As a result, in 10 years, Intel saw no hope of suppression, and was forced to cancel the independent graphics card plan.

In 2009, Apple's development team launched OpenCL, hoping to share a share of the pie with CUDA through versatility. However, OpenCL is far worse than CUDA in terms of deep learning ecology. Many learning frameworks only support OpenCL after CUDA is released, or they don't support OpenCL at all. As a result, deep learning was left behind, making OpenCL unable to reach higher value-added businesses.

In 2015, AlphaGo began to flourish in the field of Go, announcing that the era of artificial intelligence has arrived. At this time, in order to catch up with this last train, Intel incorporated AMD's GPU into its own system chip. This is the first collaboration between the two companies since the 80s of the last century. However, today, the sum of the market value of the biggest CPU, the second generation of GPUs, and the second generation of GPUs is only 1/4 of that of Nvidia, the biggest GPU.

As it stands, Nvidia's moat is almost impenetrable. Even though quite a few big customers are laughing and developing their own GPUs privately, with a huge ecosystem and rapid iterations, these big customers are unable to leverage the cracks in the empire. Tesla is a clear proof. Nvidia's money printer business will continue in the foreseeable future.

Perhaps the only place where Hwang In-hoon lurks in the dark clouds is the place where there are many customers, high demand, but the H100 cannot be sold, yet people are clenching their teeth to attack — there is only one place in the world.

Reference materials

[1] Crunchbase

[2] 'Everyone and Their Dog is Buying GPUs, 'Musk Says as AI Startup Details Emerge-Tom's HARDWARE

[3] Nvidia H100 GPUs: Supply and Demand—GPU Utils

[4] Supply chain shorts delay tech sector's AI bonanza, FT

[5] AI Capacity Constraints - CoWoS and HBM Supply Chain-Dylan PATEL, MYRON XIE, AND GERALD WONG, Semianalysis

[6] Taiwan Semiconductor: Understandings As Chip And Package Supplier To Nvidia-Robert Castellano, Seeking Alpha

[7] Chip Wars, Yu Sheng

[8] What is CUDA? Parallel Programming for GPUS - Martin Heller, InfoWorld

[9] NVIDIA DGX H100 User Guide

Editor/Somer

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.