share_log

破局大模型算力荒,北京建设数字经济算力中心

Due to the shortage of computing power in large-scale model analysis, bj properties is constructing a digital economic computing center.

cls.cn ·  Jun 6 16:03

Academician Zheng Weimin of China Engineering Academy had made such a calculation before. During the large-scale model training process, 70% of the expenses are spent on computing power, and 95% of the costs are also spent on computing power during the inference process. Since this year, several large model manufacturers have completed architecture upgrades and released large models based on MoE architecture. From test data, the performance of large models under the new architecture has been significantly improved.

Calculating power is the infrastructure of AI development. According to Cailian Press, the construction of the Beijing Digital Economy Computing Center, built with the concept of "AI Factory", has officially started. It is expected to complete its infrastructure construction later this year.

The project has completed the deployment, installation, and debugging of 400 PFlops of domestic and foreign computing equipment in the Jiuxianqiao area. The Beijing Digital Economy Computing Center's infrastructure construction is expected to be completed by the end of 2024. After the overall production, it will gradually accumulate 2000 PFlops of intelligent computing power to provide inclusive computing power services for the AI industry and support the development of the digital economy industry in Chaoyang District and even Beijing.

High Ze Long, Vice Chairman of the Digital Economy Platform Branch of the China Communications Industry Association, told Cailian Press that the construction of computing power centers can provide powerful computing resources, which is essential for the development of the digital economy because computing power is the core of supporting digital technologies such as big data, cloud computing, and artificial intelligence. In addition, inclusive computing power services can lower the technical thresholds and costs for enterprises, especially small and medium-sized ones, enabling them to more easily access and use advanced digital technologies, thereby promoting industrial upgrading and transformation.

Academician Zheng Weimin previously calculated that during the large-scale model training process, 70% of the cost is spent on calculating power, and 95% of the costs during inference are also spent on calculating power.

Zheng Weimin said that the building cost of existing 14 supercomputer systems listed by 14 countries is quite high, ranging from 1 billion to 2 billion yuan, or even higher. These supercomputer systems have made huge contributions to the development of China's national economy. However, some systems have spare computing power that can be used for large-scale model training, which, after optimization, can even reduce the cost of large-scale model training.

The development of computing power is inseparable from the government's high attention to the AI industry. In October 2023, six departments including the Ministry of Industry and Information Technology jointly issued the Development Action Plan for High-Quality Computing Power Infrastructure, proposing that China's computing power scale exceed 300 EFLOPs and the proportion of intelligent computing power reach 35% by 2025.

Specifically for cities, Cailian Press learned that Beijing plans to add 8,000 public intelligent computing powers in 2021. At present, the Beijing Artificial Intelligence Public Computing Platform Jingnengshangzhuang node has installed 2,000P high-performance computing power, reaching a supply of 3,500P computing power, which will be further expanded based on industrial development needs to plan to build a 10,000P computing power and form a large-scale computing power cluster.

Currently, as large-scale models continue to improve in performance, they also face the problem of a significant increase in computing power consumption, which brings great difficulties and challenges to the landing application of large-scale models in enterprises. Significantly improving the efficiency of the model algorithm will provide a high-efficiency path for enterprises to develop application-generated AI models with high performance and low threshold computing power.

Since the beginning of this year, many large-model manufacturers have completed architecture upgrades and released large models based on the MoE architecture. Based on test data, the performance of large models under the new architecture has been significantly improved.

From the perspective of enterprises, Inspur Electronic Information Industry Co., Ltd. recently released the "Source 2.0-M32" open-source large model. Based on the "Source 2.0" series of large models, it innovatively proposes and adopts the "attention-based gate network", constructs a mixed expert model (MoE) containing 32 experts, and significantly improves the efficiency of model computing power. The model has 3.7 billion active parameters during operation. Its performance in the mainstream benchmark evaluation of 700 billion LLaMA3 open-source large models is comprehensive.

Wu Shaohua, Chief Scientist of Artificial Intelligence at Inspur, told Cailian Press that as large-scale models continue to improve in performance, significant increases in computing power consumption have become a problem. This brings great difficulties and challenges to enterprises landing large-scale models. We have been thinking about how to improve the application effect of the entire large model with lower computing power consumption. This can enable enterprises and institutions to obtain higher model capabilities at a smaller cost of computing power.

Wu Shaohua further told the reporter that although the current model's ability to improve very fast, everyone previously focused more on single-dimensional issues, namely the improvement of average accuracy. However, when large-scale models enter the era of rapid landing, it is necessary to consider more dimensions of problems, including model algorithm efficiency, accuracy, and computing power costs.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment