share_log

美国对DeepSeek开展国家安全调查!阿里突放大招,国产大模型又有重磅

The USA has launched a National Security investigation against DeepSeek! Alibaba suddenly releases a major move, and the domestic large model has another heavy impact.

Securities Times ·  Jan 29 10:06

Source: Securities Times
Author: Zhou Chunmei

On the first day of the Lunar New Year, Alibaba launched the first big model of the New Year.

At 1:30 AM on January 29, the flagship model Qwen2.5-Max of Alibaba Cloud Tongyi was officially upgraded and released. According to its introduction, the Qwen2.5-Max model is the latest exploration result of the MoE model by the Alibaba Cloud Tongyi team, with pre-training data exceeding 20 trillion tokens, showing extremely strong overall performance, scoring high on multiple mainstream model evaluation benchmarks and fully surpassing the currently leading open-source MoE models and the largest open-source dense models globally.

The models compared to Qwen2.5-Max include the recently popular V3 model from DeepSeek, both domestically and internationally. In response to the new model,$Alibaba (BABA.US)$US stocks surged, rising over 7% at one point, closing with an increase of 6.71%, reporting $96.03 per share.

In recent days, DeepSeek has impacted the investment logic in the US stock market, leading to$NVIDIA (NVDA.US)$Waiting for the share prices of the giants to fluctuate significantly.

According to news from CCTV, on January 28 local time, several officials from the USA responded to the impact of DeepSeek on the USA, stating that DeepSeek is "theft" and is undergoing a National Security investigation regarding its impact.

Just the day before, President Trump of the USA referred to DeepSeek as a very positive technological achievement.

Whether it's DeepSeek, which has caused a huge stir in Silicon Valley, Wall Street, and the White House, or the recently released large model from Alibaba, the recent concentrated innovative achievements of domestic large models indicate that China's progress and catch-up in AI have significantly changed the global AI Industry landscape.

Alibaba's new model is globally leading.

The Alibaba Tongyi Qianwen team stated that Qwen2.5-Max uses a super-large scale MoE (Mixture of Experts) architecture, trained on over 20 trillion tokens of pre-training data and a carefully designed post-training plan.

It is reported that Qwen2.5-Max demonstrates globally leading model performance on mainstream authoritative benchmark tests, including knowledge, programming, comprehensive evaluation capabilities, and alignment with human preferences. The instruction model is a version of the model that everyone can experience directly in conversation, and in benchmark tests such as Arena-Hard, LiveBench, LiveCodeBench, GPQA-Diamond, and MMLU-Pro, Qwen2.5-Max is on par with Claude-3.5-Sonnet, and almost completely surpasses GPT-4o, DeepSeek-V3, and Llama-3.1-405B.

At the same time, the base model reflects the bare performance of the model. Due to the inability to access the base models of closed-source models such as GPT-4o and Claude-3.5-Sonnet, the Tongyi team compared Qwen2.5-Max with the currently leading open-source MoE model DeepSeek V3, the largest open-source dense model Llama-3.1-405B, and Qwen2.5-72B, which also ranks among the top open-source dense models. The results showed that Qwen2.5-Max surpassed the comparison models in all 11 benchmark tests.

The reporter also noted that in addition to releasing Qwen2.5-Max, on January 28, Alibaba also open-sourced a brand new visual understanding model Qwen2.5-VL, launching three size versions: 3B, 7B, and 72B. Among them, the flagship version Qwen2.5-VL-72B won the visual understanding championship in 13 authoritative evaluations, comprehensively surpassing GPT-4o and Claude3.5.

Affected by the new model, Alibaba's US stock rose, at one point exceeding 7%, closing with a gain of 6.71%, reported at $96.03 per share. The release of Qwen2.5-Max sparked discussions in the Capital Markets about re-evaluating China's AI Assets. If the timeline of Alibaba's stock price after its US listing is extended, its stock price entered a downward channel after reaching a high of $311.046 in 2020. Industry insiders analyzed that Alibaba Cloud not only released a model on par with or even superior to the world's top models but also has a complete cloud ecosystem, which may create an investment logic similar to that of last year's North American Cloud Computing companies.

Aside from DeepSeek, it is also worth paying attention to major manufacturers with large models.

In recent days, everyone's attention has been on DeepSeek, but a core technology backbone from a leading domestic model manufacturer told the Securities Times that the large model capabilities of major Internet companies including Tongyi Qianwen from Alibaba, Byte's Doubao, and Tencent's Hunyuan are actually not inferior; it's just that DeepSeek, as a Venture, has a different development strategy compared to Internet giants. DeepSeek, being a purely technology-driven company, has its code and training methods completely open-sourced, while Internet giants often do not open-source everything due to commercial considerations.

The reason for DeepSeek’s breakout is primarily related to the financial market. In terms of base capabilities, it is actually not that strong, and the impact on us is not that significant. The logic for the rise in the US stock market mainly revolves around AI and NVIDIA chips, but DeepSeek has made people realize that it is possible to create similarly performing models without so many NVIDIA cards. Moreover, it was open-sourced, which is why DeepSeek has attracted so much attention.

At the same time, DeepSeek is mainly strong in text generation and understanding capabilities, particularly adept at long texts and complex contexts in Chinese. DeepSeek V3 and R1 currently do not possess multimodal generation capabilities. An industry practitioner told the reporter that large models represented by Doubao and others belong to the multimodal large model category, which integrates various modalities such as images, audio, and video on the foundation of large language models, requiring higher computational foundations. They not only need to support large-scale training tasks but also ensure the timeliness and efficiency of edge-side applications.

Thus, DeepSeek, besides reducing training costs through innovative architectures and optimized algorithms, can also focus more on the field of large language models. A senior executive from a domestic large model company pointed out while analyzing DeepSeek's success that having relatively ample cards (computational resources), lacking financing pressure, and focusing solely on models without developing products in the previous years have made DeepSeek purer and more focused, allowing breakthroughs in engineering technology and algorithms.

The core technical backbone of the aforementioned domestic leading large model manufacturers also revealed that on January 22, ByteDance released the Doubao Large Model 1.5 Pro, which outperforms many leading models on multiple evaluation benchmarks, "Our pressure does not come from DeepSeek, but from Doubao, it's just that Doubao 1.5 Pro has not gained much attention outside, so everyone hasn’t noticed it," said the technical backbone.

DeepSeek faces controversy over 'distillation'.

Journalists noted that ByteDance's research team also stated that Doubao 1.5 Pro continuously optimizes data quality by combining an efficient labeling team with model self-improvement, strictly adhering to internal standards, not using data from any other models, thus ensuring the independence and reliability of the data source, which means no shortcuts were taken by 'distilling' other models.

'Distillation' refers to a method developers use to optimize smaller models, a technique widely used in deep learning and machine learning. Simply put, this means using the results output by a pre-trained complex model as supervisory signals to train another simpler model. This can significantly reduce the consumption of computational resources, allowing the smaller model to achieve similar results at a low cost for specific tasks.

DeepSeek's technical documentation states that R1 model used high-quality data generated by data distillation technology to improve training efficiency. On Tuesday, David Sachs, the White House official in charge of AI and crypto affairs, claimed in an interview with the media that DeepSeek 'may have' stolen U.S. intellectual property to rise. He also stated that leading AI companies in the USA will take measures in the coming months to attempt to prevent 'distillation'. According to the Financial Times, OpenAI reported that it found evidence that DeepSeek used OpenAI proprietary models to train its own open-source models but refused to reveal further details of its evidence.

However, many industry insiders stated that 'distillation', although controversial, is actually a commonly used method in large model training. Training complex models requires significant resource investment and hiring professionals to teach the model how to generate responses that fit human expression, which consumes both money and time, while 'distillation' can avoid this problem. Therefore, whether in China or the USA, it is viewed as a widespread, 'unspoken' phenomenon for startups and academic institutions to use commercially available large language models like ChatGPT, which are optimized with human feedback, to output data for training their own models.

In a paper jointly published by the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and Peking University, titled 'Distillation of Large Language Models', researchers mentioned that besides Claude, Doubao, and Gemini, the currently known open and closed-source large language models have displayed a high level of 'distillation'. Researchers generally believe that 'distillation' improves training efficiency and reduces costs but may lower the uniqueness of the models, and excessive 'distillation' may also lead to performance degradation.

Editor/Jeffy

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
69
Comment Comment 36 · Views 419.5k

Comment(36)

Recommended

Write a comment
36

Statement

This page is machine-translated. Futubull tries to improve but does not guarantee the accuracy and reliability of the translation, and will not be liable for any loss or damage caused by any inaccuracy or omission of the translation.