share_log

马斯克启动“全球最强大AI训练集群” 背后隐藏内忧外患

Musk launches the most powerful global AI training cluster, with hidden internal and external worries behind it.

cls.cn ·  Jul 24 10:27

On July 22nd at 4:20 am local time, Musk announced on his X that AI began training on the Memphis Supercluster. Technology giants are also expanding their data centers to train and run their AI models, and Musk's 'most powerful supercluster' is difficult to sustain.

On July 24th, Ka Chuang Ban Daily reported that (Reporter Zhang Yangyang and special reporter Chen Junqing) Musk announced on its X social platform that xAI team, X team,$NVIDIA (NVDA.US)$and supported the company to start training on the Memphis Supercluster at 4:20 am on July 22 local time.

According to Musk, the cluster uses 0.1 million liquid-cooled H100s on a single RDMA fabric, making it 'the most powerful AI training cluster in the world' measured by every indicator. Its goal is to train 'the world's most powerful artificial intelligence' by December of this year.

Musk previously stated that xAI plans to release Grok 2 in August, but it has not yet claimed to use the new supercomputing cluster to train Grok 2. However, it is certain that Grok 3, scheduled for release by the end of 2024, will be trained on the Memphis super training cluster. Earlier this month, Musk pointed out in an X article that xAI's Grok 3 will be trained on 100,000 H100 GPUs, so 'it should be very special.'

In terms of scale, the new xAI Memphis supercluster does surpass any supercomputer on the latest Top500 list in terms of GPU computing power. Frontier (37,888 AMD GPUs), Aurora (60,000 Intel GPUs) and Microsoft Eagle (14,400 Nvidia H100 GPUs) all seem to lag behind xAI machines by a wide margin.

Although massive in scale, the title of 'the most powerful AI training cluster in the world' is hard to keep in the long run.

Currently, Cisco, Fujitsu, LM Ericsson Telephone, Nokia Oyj, HPE, NEC, Juniper Networks, Ciena, Volkswagen, and Western Digital are in the project. In addition, there are several larger projects underway: 1) Amazon Inferentia ASIC, expected to launch in 2025; 2) Microsoft Maia, expected to launch in 2026.$Microsoft (MSFT.US)$,$Alphabet-C (GOOG.US)$/$Alphabet-A (GOOGL.US)$And.$Meta Platforms (META.US)$Other technology giants, such as Microsoft and OpenAI, are also expanding their data centers to train and run their AI models. Reuters previously reported that they are planning to build a data center project, which will include a supercomputer with millions of dedicated server chips, and the project may cost up to $115 billion, including an AI supercomputer named Stargate, which is expected to launch in 2028.

In addition, Meta CEO Zuckerberg said in January of this year that the company's computing infrastructure would include 0.03 million H100 graphics cards by the end of 2024. He also added, 'If other GPUs are included, there are approximately 0.6 million H100 equivalent computings.'

In addition to the external competition for computing power, xAI's construction of computing centers also faces internal problems.

According to local media in Memphis, xAI will build a supercomputer cluster in the former Electrolux Memphis factory, covering an area of 785,000 square feet, and 'will be the largest capital investment by a newly-entered company in the history of the city.'

Ted Townsend, CEO of the Memphis Chamber of Commerce, who was responsible for the transaction, said Musk and his team, including representatives from several of his companies, chose Memphis, Tennessee after several days of intense negotiations in March because of its abundant electricity supply and fast construction speed.

However, xAI has not yet signed a contract with the local utility company Tennessee Valley Authority (TVA). 'TVA has not yet signed a contract with xAI. We are working with xAI and our partner MLGW on the proposal and details of power demand,' TVA said, noting that any project over 100 MW needs TVA approval.

Despite the praise of the Greater Memphis Chamber groups for xAI's decision to open facilities in the area, some locals are concerned about the energy and water consumption of the facility. The Memphis Community Anti-Pollution Organization and two other environmental groups warned that the computer facility would cause serious 'energy burden.' They said, 'xAI is expected to need at least one million gallons of water a day for its cooling tower.'

Several members of the Memphis City Council are urging the government to stop Musk's computing factory from being built in Memphis because of concerns in the community about the secretive nature of the transaction and the power and water requirements of the data center.

Editor / jayden

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment