share_log

DeepSeek-V3低调发布后,业内震惊之余再次怀疑硅谷模式

After the low-key release of DeepSeek-V3, the industry was shocked and once again questioned the Silicon Valley model.

cls.cn ·  Mar 25 05:44

①DeepSeek released the V3-0324 model on Monday night, initial tests show that it can run on Consumer-level Hardware, breaking the traditional idea that large models need a Datacenter; ②On one hand, DeepSeek's model continues to reduce the energy consumption and computing costs of large models, while on the other hand, it remains open source to continuously drive technological innovation, leading to rapid development in the domestic AI Industry, while also raising questions about Silicon Valley's closed payment model.

Chinese AI startup DeepSeek quietly launched a new model on Monday, known as DeepSeek-V3-0324, which has 685 billion parameters. It went live on the AI repository Hugging Face without any announcement, yet it still caused a stir in the industry.

This model has obtained the MIT license, meaning it can be freely used for commercial purposes, and early tests in the industry have confirmed that the model can run directly on Consumer-level Hardware, such as the high-end Apple Mac Studio.

AI researcher Awni Hannun stated that the new DeepSeek-V3 model can run on Apple computers equipped with the M3 Ultra chip at a speed of 20 tokens per second. This breaks the earlier consensus in the industry regarding the capabilities of AI models in relation to local running or conflicts, also indicating that a Datacenter is not a necessary pairing for large models.

Another AI researcher, Xeophon, claimed on X that after testing the new DeepSeek-V3 on an internal workbench, it showed a huge leap in all tested Indicators. It is now the best non-inference model, surpassing Oracle's Claude Sonnet 3.5.

Low-key but sensational.

The DeepSeek-V3-0324 was released without an accompanying white paper or any promotional material, only an empty ReadMe file. This nearly simplistic release format sharply contrasts with Silicon Valley's meticulously crafted product promotion models.

Meanwhile, all of DeepSeek's models are open-source, available for anyone to download and use for free, in stark contrast to one of the best commercial models, Claude Sonnet, which charges $20 per month.

In addition, DeepSeek has fundamentally reimagined the operation of large language models, activating only about 37 billion parameters during specific tasks instead of all, which is known as the "expert" module, greatly reducing computational demands.

The model also features two other groundbreaking technologies: multi-head latent attention (MLA) and multi-label prediction (MTP). MLA enhances the model's ability to maintain context in long texts, while MTP generates multiple labels at each step instead of the usual one label at a time. These innovations collectively improved output speed by nearly 80%.

To some extent, DeepSeek embodies the spirit of Chinese companies' extreme pursuit of efficiency and resources, i.e., how to achieve equal or even optimized performance with limited computational resources. This demand-driven innovation has allowed China's AI to astonish the Global community in a matter of months.

The changes in DeepSeek's new model also have significant implications for the Industry; on one hand, it greatly reduces the energy consumption and computational costs of large models, further shaking Wall Street's assumptions about the scale of investment in top model infrastructure; on the other hand, the widespread consensus on open-source within China's AI Industry has rapidly promoted the development of the domestic AI sector, continuously narrowing the gap with the world's top competitors.

Some believe that with DeepSeek's rapid catch-up, its R2 model, planned for release in April, may directly challenge OpenAI's long-promoted GPT-5 model. If this prospect truly materializes, then the differing approaches to AI development between China and the US may face direct confrontation.

Editor/lambor

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
25
Comment Comment 10 · Views 54.8k

Comment(10)

Recommended

Write a comment
10

Statement

This page is machine-translated. Futubull tries to improve but does not guarantee the accuracy and reliability of the translation, and will not be liable for any loss or damage caused by any inaccuracy or omission of the translation.