share_log

华福证券:如何测算文本大模型AI训练端算力需求?

Huafu Securities: How to calculate the computing power required for large-scale AI text model training?

Zhitong Finance ·  Jun 4 19:36

According to a research report released by Huafu Securities, based on the formula of computing power supply and demand, and assuming that the industry will continue to develop in the direction of Scaling Law, the demand side can derive the amount of GPU demand through assumptions regarding various factors such as the FP16 computing power and training market of Nvidia, and utilization rate of such computing power. The Securities firm believes that from 2024 to 2026, the global demand for GPUs in large-scale text AI training, calculated by the FP16 computing power of Nvidia's Hopper/Blackwell/next generation GPU, will be 27.1/59.2/124.4 million units. The Securities firm recommends paying attention to the industry chain of computing chips and servers.

Huafu Securities' main points are as follows:

Demand side: Scaling Law drives the increasing demand for large-scale computing

Scaling Law remains an important criterion driving industry development. The basic principle of Scaling Law is that the final performance of a model mainly correlates with the amount of calculation, model parameter amount, and data size. When the other two factors are not restricted, model performance is related to each factor's power law. Therefore, to improve model performance, model parameter amount and data size must be scaled up in sync. From the number of large models, there has been a significant growth trend in recent years. As a result of the significant demand for resources required by advanced AI models, the industry's influence on large models has gradually deepened. The Securities firm has gathered together many publicly disclosed large model training data from the industry, and from the perspective of large model computing power demand, GPT-3's parameter size rose from 175B to 1.8TB in GPT-4 (a 9-fold increase) while the training data amount (Token number) rose from 0.3TB to 13TB (a 42-fold increase). In terms of absolute value, according to the Securities firm's incomplete statistics, the parameter sizes of mainstream large models have already reached billions, if not tens of billions, while pre-training data sizes are at TB-levels.

Supply side: Huang's Law propels Nvidia GPUs upward

Nvidia GPUs remain at the forefront of global AI computing power development. Although Moore's Law has gradually slowed, Huang's Law continues to support the rapid increase in Nvidia GPU computing power. On one hand, Nvidia seeks to use manufacturing process iterations, larger HBM capacity and bandwidth, dual-die design, and other methods; on the other hand, the accuracy of data reduction plays a crucial role. Blackwell supports the new FP4 format, although low precision may limit its application, it represents a type of computing power improvement strategy. Focusing solely FP16 computing power, the FP16 computing powers of A100/H100/GB200 products are 2.5/6.3/2.5 times those of previous generation products, and their levels of growth have continued to explode in recent years. As a comparison, the explosive growth rate of AI large model parameters is relatively faster. For example, from 2018 to 2023, the GPT series expanded its model parameters from a scale of 1 billion to 1.8 trillion. In comparison to AI's large model-driven parameter explosion, the growth rate of GPU computing power still requires improvement.

Conclusion: It is expected that the global demand for GPU training cards for large-scale text models will be 27.1/59.2/124.4 million units from 2024 to 2026.

The Securities firm derived the amount of GPU demand on the demand side through assumptions regarding various factors such as the FP16 computing power and training market of Nvidia, and utilization rate of such computing power, while a formula on computing power supply and demand was taken into consideration for the supply side. Based on this, the Securities firm believes that from 2024 to 2026, the global demand for GPUs in large-scale text AI training, calculated by the FP16 computing power of Nvidia's Hopper/Blackwell/next generation GPU, will be 27.1/59.2/124.4 million units.

Recommendations:

Computing chip: Cambricon Technologies (688256.SH), Hygon Information (688041.SH), and Loongson Technology (688047.SH).

Server industry chain: Foxconn Industrial Internet (601138.SH), Wus Printed Circuit (002463.SZ), Shennan Circuits (002916.SZ), and Victory Giant Technology (300476.SZ).

Risk warning: Risks related to AI demands falling short of expectations, risks related to Scaling Law becoming invalid, risks related to GPU technology upgrades falling short of expectations, risks related to computational model deviation.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment