Source: Silicon Labs
Author: Wang Yichuan
The current H100 supply promise is comparable to a land approval document in the golden age of real estate.
On August 3, 2023, Wall Street and Silicon Valley jointly presented a major event that shocked the industry: getting a startup to obtain $2.3 billion in debt financing, and collateral is currently the hardest currency in the world --H100 video card.
The protagonist of this major event is called CoreWeave. Its main business is AI private cloud services. Simply put, it provides computing power infrastructure to AI startups and large-scale commercial customers by building data centers with a large amount of GPU computing power. CoreWeave has raised a total of 580 million US dollars. Currently, it is round B, with a valuation of 2 billion US dollars.
CoreWeave was founded in 2016 by three Wall Street commodity traders. At the beginning, the company only had one main business: mining, purchasing a large number of GPUs to set up a mining machine center.Especially when the coin industry is at a low point, the company will stock up on a large number of graphics cards countercyclically, and as a result, it has established a strong revolutionary friendship with Nvidia.
In 2019, CoreWeave began to transform these mining machines into enterprise-grade data centers to provide customers with AI cloud services. The initial business was tepid, but after the birth of ChatGPT, training and reasoning for big models consumed a lot of computing power every day. CoreWeave, which already has tens of thousands of graphics cards (not necessarily the latest model, of course), took off, and the door was full of customers and venture capital.
What is surprising, however, is that CoreWeave has only consolidated a total of 580 million US dollars, the net value of the book GPU will not exceed 1 billion US dollars, and even the company's overall valuation is only 2 billion US dollars, but why can it borrow 2.3 billion US dollars through collateral? Why is Wall Street, which has always been good at calculating and passionate about the value of collateral, so generous?
The reason is most likely: CoreWeave doesn't have that many video cards on its account, but it has received Nvidia's supply promises, especially the H100.
CoreWeave's hardcore relationship with Nvidia is already an open secret in Silicon Valley.This kind of hardcore roots stems from CoreWeave's unwavering loyalty and support for Nvidia — using only Nvidia cards, determined not to build its own core, and helping Nvidia stock up on cards when the graphics card doesn't sell well.For Hwang In-hoon, the richness of this relationship far exceeds that of plastic friendships with Microsoft, Google, and Tesla.
As a result, even though the Nvidia H100 is in short supply, Nvidia has distributed a large number of new cards to CoreWeave, even limiting supply to major manufacturers such as Amazon and Google. Wong In-hoon praised during the conference call: “A number of new GPU cloud service providers will rise, the most famous of which is CoreWeave. They have done a great job.”
And just a week before it raised 2.3 billion US dollars, CoreWeave had already announced that it would spend 1.6 billion US dollars to build a 42,000 square meter data center in Texas. With its relationship with Nvidia and priority distribution rights alone, CoreWeave can borrow money from banks to build data centers — a model reminiscent of real estate owners seeking bank loans immediately after acquiring land.
So it can be said this way: the next H100 supply promise is comparable to a land approval document in the golden age of real estate.
One H100 card is hard to find
In an interview in April of this year, Musk complained [2]:“It seems like even dogs are buying GPUs now.”
Ironically, Tesla released its self-developed D1 chip as early as 2021. It is manufactured by TSMC and uses a 7nm process, which claims to be able to replace Nvidia's mainstream A100 at the time. However, after 2 years, Nvidia launched the more powerful H100, and Tesla's D1 did not have subsequent iterations, so when Musk tried to set up his own artificial intelligence company, he still had to kneel down in front of Grandpa Huang to ask for cards.
The H100 was officially launched on September 20 last year, and was manufactured by TSMC's 4N process. Compared with the previous A100, the H100 single card is 3.5 times faster in inference speed and 2.3 times faster in training speed; if server cluster computing is used, the training speed can be increased 9 times. Originally, the weekly workload is now only 20 hours.
Compared to the A100, the H100's single card is more expensive, about 1.5 to 2 times that of the A100, but the efficiency of training large models has increased by 200%, so the “performance per dollar” calculated is higher.If used with Nvidia's latest high-speed connectivity system solution, the GPU performance per dollar may be 4-5 times higher, so it is insanely sought after by customers.
Customers who snapped up the H100 are mainly divided into three categories:
The first category isA comprehensive cloud computing giant, such as cloud computing giants such as Microsoft Azure, Google GCP, and Amazon AWS. Their characteristic is that they have a lot of money, and they always want to “spend money” on Nvidia's production capacity, yet they are also careful. They are dissatisfied with Nvidia's near-monopoly position, and secretly develop their own chips to reduce costs.
The second category isIndependent cloud GPU service provider, typical companies such as CoreWeave mentioned earlier, as well as Lambda, RunPod, etc. These companies have relatively small computing power, but they can provide differentiated services, and Nvidia also strongly supports such companies, and has even directly invested in CoreWeave and Lambda. The purpose is very clear: to give eye medicine to giants that build their own cores.
The third category isCompanies big and small that are training LLM (Big Language Models) themselves.It includes startups such as Anthropic, Inflection, and Midjourney, as well as tech giants such as Apple, Tesla, and Meta. They usually use the computing power of an external cloud service provider while purchasing their own GPUs to build stoves—buy more if you have money, buy less if you don't have money, and focus on people who are thrifty and thrifty.
Among these three types of customers, Microsoft Azure has at least 50,000 H100 cards, Google Ger has about 30,000 copies, Oracle has about 20,000 copies, and Tesla and Amazon also have at least 10,000 copies. CoreWeave is said to have a quota commitment of 35,000 copies (about 10,000 copies actually arrived). Few other companies have more than 10,000 copies.
How many H100s do these three types of customers need in total? According to the forecast of overseas agency GPU Utils, the current demand for the H100 is about 432,000 copies.Among them, OpenAI needs 50,000 cards to train GPT-5, Inflection requires 22,000 copies, Meta requires 25,000 copies (there is also a saying that it is 100,000), each of the four major public cloud vendors needs at least 30,000 copies, the private cloud industry needs 100,000 copies, and other small model vendors also need 100,000 copies [3].
Nvidia's H100 shipments in 2023 will probably be around 500,000Currently, TSMC's production capacity is still climbing, and by the end of the year, the difficult situation of finding an H100 card will be alleviated.
However, in the long run, the supply and demand gap for H100 will continue to rise as the application of AIGC explodes. According to the Financial Times, H100 shipments in 2024 will reach 1.5 million to 2 million units, an increase of 3-4 times compared to 500,000 this year [4].
Wall Street's predictions, on the other hand, are more aggressive: US investment bank Piper Sandler believes that next year Nvidia's data center revenue will exceed 60 billion US dollars (FY24Q2:10.32 billion US dollars). Inversion from this data, shipments of A+H cards are close to 3 million.
There are more exaggerated estimates. One of the largest H100 server foundries (70%-80% market share) has been shipping H100 servers one after another since June of this year, and production capacity climbed one after another in July. According to a recent survey, the foundry believes that the number of A+H cards shipped in 2024 will be between 4.5 million and 5 million.
This means “extraordinary wealth” for Nvidia, because the level of profit of the H100 is unimaginable for people in other industries.
A video card that is more expensive than gold
To find out how profitable the H100 is, we might as well completely disassemble its Bill of Materials (BOM).
As shown in the figure, the most common version of the H100, the H100 SXM, uses TSMC's CoWOS 7-chip package. The 6 16G HBM3 chips are arranged in two rows tightly surrounding the logic chip in the middle.
And this also forms the three most important parts of the H100: logic chip, HBM memory chip, and CoOS packageIn addition to this, there are also auxiliary devices such as PCB boards and other auxiliary devices, but the value is not high.
The core logic chip size is 814mm^2. It is produced at TSMC's most advanced plant No. 18 in Tainan. The process node used is “4N”. Although the name starts at 4, it is actually 5nm+. Due to the downstream of 5nm, the boom in mobile phones and other fields is poor, so TSMC has no problems with insurance and supply logic chips.
However, this logic chip is produced by cutting a 12-inch (area 70,695mm^2) wafer. Under ideal conditions, 86 blocks can be cut, but considering 80% yield and cutting loss of the “4N” line, the last 12-inch wafer can only cut out 65 core logic chips.
How much does this core logic chip cost? TSMC's external price for a 12-inch wafer in 2023 is 13,400 US dollars, so after conversion, a single block is about 200 US dollars.
What's next6 HBM3 chips, currently exclusively sold by SK HynixThis company, which originated in modern electronics, almost had to commit to Micron in 2002. With the government's blood transfusion and countercyclical production capacity strategy, it is now at least 3 years ahead of Micron in terms of HBM mass production technology (Micron cards are in HBM2e, and Hynix is mass-produced by Hynix in mid-2020).
Everyone is secretive about the exact price of HBM, but according to Korean media, HBM is currently 5-6 times that of existing DRAM products.However, the price of the current GDDR6 VRAM is about 3 US dollars per GB; according to this estimate, the price of HBM is around 15 US dollars per GB. That H100 SXM costs on HBM is 1,500 dollars.
Although the price of HBM continues to rise this year, and Nvidia and Meta executives personally went to Hynix to “supervise”, Samsung's HBM3 will gradually be mass-produced and shipped in the second half of the year. Coupled with the expanding lineage of Korea's two great ancestors, it is certain that HBM will no longer be a bottleneck by next year.
The real bottleneck is TSMC's CoOS package, which is a 2.5D packaging process.Compared to 3D packages that are drilled directly into the chip (TSV) and wired (RDL), CoOS can provide better cost, heat dissipation, and throughput bandwidth. The first two correspond to HBM, while the latter two are the key to GPUs.
Therefore, if you want a chip with high storage power and high computing power, CoOS is the only solution in terms of packaging. The best proof is that all four GPUs from Nvidia and AMD use Cowos.
How much does CoWoS cost? TSMC's financial report for the year 22 revealed that the CoWoS process accounted for 7% of total revenue.So, based on production capacity and the size of the bare crystal, overseas analyst Robert Castellano estimated that packaging an AI chip could bring TSMC 723 dollars in revenue[6].
Therefore, adding up the three biggest cost items mentioned above, totaling about 2,500 US dollars, of which TSMC accounts for about $1,000 (logic chip+CoOS), SK Hynix accounts for 1,500 US dollars (Samsung will definitely be involved in the future), and count other materials such as PCBs.The overall material cost is not more than 3000 US dollars.
So how much does the H100 sell for? 35,000 US dollars, directly plus a zero, gross margin of more than 90%.Nvidia's gross margin was around 60% in the past 10 years. Now driven by the high gross profit of A100/A800/H100, Nvidia's gross margin has risen to 70% in Q2 this year.
This is a bit counterintuitive: Nvidia relies heavily on TSMC's OEM, whose status is unshakable, and is even the only core part of Nvidia's neck. However, with such a 35,000 US dollar card, TSMC can only get 1,000 US dollars to make it, and it's only revenue, not profit.
However, using gross margin to define huge profits is of little significance to chip companies. If we start with sand, then gross margin is higher. For a 12-inch wafer with a 4N process, TSMC sells it to almost anyone for about 15,000 US dollars each piece. Nvidia can add a retail sale to customers; naturally, it has a knack for it.
The secret to this trick is: Nvidia is essentially a software company masquerading as a hardware manufacturer.
A moat that integrates hard and soft
Nvidia's most powerful weapon is hidden in the gross margin minus net interest rate.
Prior to the current AI boom, Nvidia's gross margin remained around 65% all year round, while the net profit margin was usually only 30%. However, this year's Q2 was driven by high gross profit A100/A800/H100. The gross margin was 70%, and the net profit margin was as high as 45.81%.
Nvidia currently has more than 20,000 employees around the world, most of them highly paid software and hardware engineers. According to Glassdoor recruitment data in the US, the average annual salary for these positions is basically higher than 200,000 US dollars/year.
Over the past ten years, the absolute value of Nvidia's R&D expenditure has maintained rapid growth, while the R&D expenditure rate has remained stable at over 20%. Of course, if demand for terminals explodes in a certain year, such as deep learning in 2017, mining in '21, and big language models this year, the revenue denominator suddenly rises, the R&D expense rate will briefly fall by 20%, and profits will skyrocket nonlinearly accordingly.
And the most critical of all the projects Nvidia has developed is certainly CUDA.
In 2003, in order to solve the problem that the DirectX programming threshold was too high, Ian Buck's team introduced a programming model called Brook, which was also the prototype of CUDA, which people often referred to later. Buck joined Nvidia in 2006 and convinced Wong In-hoon to develop CUDA [8].
Because it supports parallel computing in the C language environment, CUDA has quickly become the engineer's first choice, and has also enabled GPUs to embark on the path of general-purpose processors (GPGPUs).
After CUDA gradually matured, Buck once again persuaded Huang Renxun so that all future Nvidia GPUs must support CUDA. The CUDA project was established in 2006, and the product was launched in 2007. At that time, Nvidia's annual revenue was only 3 billion US dollars, but it was spending 500 million US dollars on CUDA. By '17, R&D expenditure on CUDA alone had exceeded 10 billion dollars.
A CEO of a private cloud company once said in an interview that they haven't thought about switching to an AMD card, but it will take at least two months to get these cards up and running properly [3].And in order to shorten these two months, Nvidia has invested tens of millions of dollars for 20 years.
The chip industry has been in turmoil for more than half a century, and there has never been a company like Nvidia that sells both hardware and ecology, or in Hwang In-hoon's words, “sells quasi-systems.”Therefore, it is true that Nvidia's target is not the pioneers in the chip field, but Apple — another company that sells systems.
From launching CUDA in 2007 to becoming the world's largest banknote printer, Nvidia is not without rivals.
In 2008, Intel, the chip king at the time, interrupted its collaboration with Nvidia on integrated display projects and launched its own general-purpose processor (GPCPU), which aims to “lead the way” in the PC field. However, in the next few years of product iteration, Nvidia simply promoted its own processors to fields that needed more powerful computing power, such as space, finance, and biomedicine. As a result, in 10 years, Intel saw no hope of suppression, and was forced to cancel the independent graphics card plan.
In 2009, Apple's development team launched OpenCL, hoping to share a share of the pie with CUDA through versatility. However, OpenCL is far worse than CUDA in terms of deep learning ecology. Many learning frameworks only support OpenCL after CUDA is released, or they don't support OpenCL at all. As a result, deep learning was left behind, making OpenCL unable to reach higher value-added businesses.
In 2015, AlphaGo began to flourish in the field of Go, announcing that the era of artificial intelligence has arrived. At this time, in order to catch up with this last train, Intel incorporated AMD's GPU into its own system chip. This is the first collaboration between the two companies since the 80s of the last century. However, today, the sum of the market value of the biggest CPU, the second generation of GPUs, and the second generation of GPUs is only 1/4 of that of Nvidia, the biggest GPU.
As it stands, Nvidia's moat is almost impenetrable. Even though quite a few big customers are laughing and developing their own GPUs privately, with a huge ecosystem and rapid iterations, these big customers are unable to leverage the cracks in the empire. Tesla is a clear proof. Nvidia's money printer business will continue in the foreseeable future.
Perhaps the only place where Hwang In-hoon lurks in the dark clouds is the place where there are many customers, high demand, but the H100 cannot be sold, yet people are clenching their teeth to attack — there is only one place in the world.
Reference materials
[1] Crunchbase
[2] 'Everyone and Their Dog is Buying GPUs, 'Musk Says as AI Startup Details Emerge-Tom's HARDWARE
[3] Nvidia H100 GPUs: Supply and Demand—GPU Utils
[4] Supply chain shorts delay tech sector's AI bonanza, FT
[5] AI Capacity Constraints - CoWoS and HBM Supply Chain-Dylan PATEL, MYRON XIE, AND GERALD WONG, Semianalysis
[6] Taiwan Semiconductor: Understandings As Chip And Package Supplier To Nvidia-Robert Castellano, Seeking Alpha
[7] Chip Wars, Yu Sheng
[8] What is CUDA? Parallel Programming for GPUS - Martin Heller, InfoWorld
[9] NVIDIA DGX H100 User Guide
Editor/Somer