①DeepSeek released the V3-0324 model on Monday night, initial tests show that it can run on Consumer-level Hardware, breaking the traditional idea that large models need a Datacenter; ②On one hand, DeepSeek's model continues to reduce the energy consumption and computing costs of large models, while on the other hand, it remains open source to continuously drive technological innovation, leading to rapid development in the domestic AI Industry, while also raising questions about Silicon Valley's closed payment model.
Chinese AI startup DeepSeek quietly launched a new model on Monday, known as DeepSeek-V3-0324, which has 685 billion parameters. It went live on the AI repository Hugging Face without any announcement, yet it still caused a stir in the industry.
This model has obtained the MIT license, meaning it can be freely used for commercial purposes, and early tests in the industry have confirmed that the model can run directly on Consumer-level Hardware, such as the high-end Apple Mac Studio.
AI researcher Awni Hannun stated that the new DeepSeek-V3 model can run on Apple computers equipped with the M3 Ultra chip at a speed of 20 tokens per second. This breaks the earlier consensus in the industry regarding the capabilities of AI models in relation to local running or conflicts, also indicating that a Datacenter is not a necessary pairing for large models.
Another AI researcher, Xeophon, claimed on X that after testing the new DeepSeek-V3 on an internal workbench, it showed a huge leap in all tested Indicators. It is now the best non-inference model, surpassing Oracle's Claude Sonnet 3.5.
Low-key but sensational.
The DeepSeek-V3-0324 was released without an accompanying white paper or any promotional material, only an empty ReadMe file. This nearly simplistic release format sharply contrasts with Silicon Valley's meticulously crafted product promotion models.
Meanwhile, all of DeepSeek's models are open-source, available for anyone to download and use for free, in stark contrast to one of the best commercial models, Claude Sonnet, which charges $20 per month.
In addition, DeepSeek has fundamentally reimagined the operation of large language models, activating only about 37 billion parameters during specific tasks instead of all, which is known as the "expert" module, greatly reducing computational demands.
The model also features two other groundbreaking technologies: multi-head latent attention (MLA) and multi-label prediction (MTP). MLA enhances the model's ability to maintain context in long texts, while MTP generates multiple labels at each step instead of the usual one label at a time. These innovations collectively improved output speed by nearly 80%.
To some extent, DeepSeek embodies the spirit of Chinese companies' extreme pursuit of efficiency and resources, i.e., how to achieve equal or even optimized performance with limited computational resources. This demand-driven innovation has allowed China's AI to astonish the Global community in a matter of months.
The changes in DeepSeek's new model also have significant implications for the Industry; on one hand, it greatly reduces the energy consumption and computational costs of large models, further shaking Wall Street's assumptions about the scale of investment in top model infrastructure; on the other hand, the widespread consensus on open-source within China's AI Industry has rapidly promoted the development of the domestic AI sector, continuously narrowing the gap with the world's top competitors.
Some believe that with DeepSeek's rapid catch-up, its R2 model, planned for release in April, may directly challenge OpenAI's long-promoted GPT-5 model. If this prospect truly materializes, then the differing approaches to AI development between China and the US may face direct confrontation.
Editor/lambor
Comment(10)
Reason For Report