DeepSeek message dynamic tracking

After the low-key release of DeepSeek-V3, the industry was shocked and once again questioned the Silicon Valley model.

cls.cn · Mar 25 05:44

①DeepSeek周一晚发布了V3-0324模型，初步测试显示其可在消费级硬件上运行，打破大模型需要数据中心的传统思路；②DeepSeek的模型一方面继续降低大模型能耗及计算成本，另一方面保持开源以不断推动技术创新，引领国内AI行业迅速发展，也让人质疑硅谷的封闭付费模式。

中国人工智能初创公司DeepSeek周一悄然发布了新的模型，这款参数达6850亿个的模型被称为DeepSeek-V3-0324，在没有任何公告的情况下在AI存储库Hugging Face上线，但仍引起了业内的轰动。

这款模型已经取得了MIT许可证，也就意味着其可以被自由用于商业用途，且业内的早期测试证实，该模型可以直接在消费级硬件上运行，比如高端市场的苹果Mac Studio。

AI 研究员Awni Hannun表示，新的DeepSeek-V3模型可以在配备M3 Ultra芯片的苹果电脑上，以每秒20个token的速度运行。这打破了业界关于人工智能模型能力与本地化运行或冲突的早前共识，也意味着数据中心并不是大模型的必要搭配。

另一名人工智能研究员Xeophon则在X上宣称，在内部工作台上测试了新版DeepSeek-V3后，发现它在测试的所有指标上都有了巨大飞跃。它现在是最好的非推理模型，超越了甲骨文的Claude Sonnet 3.5。

低调但轰动

DeepSeek-V3-0324面世时没有附带白皮书，也没有任何宣传，只有一个空的ReadMe文件。这一近乎朴素的发布形式，与硅谷精心策划的产品宣传模型形成鲜明对比。

与此同时，DeepSeek的模型都是开源模型，可供任何人免费下载和使用，与最好商业模型之一的Claude Sonnet截然相反，后者按月收取20美元的费用。

此外，DeepSeek还从根本上重新构想了大型语言模型的运作方式，在特定任务期间仅激活约370亿个参数而非全部，也就是所谓的“专家”模块，这大大降低了计算需求。

该模型还有另外两项突破性技术：多头潜在注意力(MLA) 和多标记预测(MTP)。MLA增强了模型在长篇文本中保持上下文的能力，而MTP每一步生成多个标记，而不是通常的一次生成一个标记的方法。这些创新共同将输出速度提高了近 80%。

某种程度上，DeepSeek体现了中国企业对效率和资源极致追求的精神，即如何以有限的计算资源实现相等或者更加优化的性能。而这种由需求驱动的创新已经使中国的人工智能在几个月时间内震惊了全球。

DeepSeek新模型的改变对于业内来说也具有重大意义，一方面其大大降低了大模型的能耗及计算成本，进一步动摇了华尔街对于顶级模型基础设施的投资规模假设；另一方面，中国人工智能行业对开源的广泛共识又迅速推动国内AI行业的发展，不断缩短其与世界顶尖对手的距离。

还有人认为，在DeepSeek的快速追赶下，其计划在4月发布的R2模型有可能直接挑战OpenAI宣传已久的GPT-5模型。如果这一前景真的发生，那么中美两国发展人工智能的不同思路可能将迎来直接的交锋。

编辑/lambor

①DeepSeek released the V3-0324 model on Monday night, initial tests show that it can run on Consumer-level Hardware, breaking the traditional idea that large models need a Datacenter; ②On one hand, DeepSeek's model continues to reduce the energy consumption and computing costs of large models, while on the other hand, it remains open source to continuously drive technological innovation, leading to rapid development in the domestic AI Industry, while also raising questions about Silicon Valley's closed payment model.

Chinese AI startup DeepSeek quietly launched a new model on Monday, known as DeepSeek-V3-0324, which has 685 billion parameters. It went live on the AI repository Hugging Face without any announcement, yet it still caused a stir in the industry.

This model has obtained the MIT license, meaning it can be freely used for commercial purposes, and early tests in the industry have confirmed that the model can run directly on Consumer-level Hardware, such as the high-end Apple Mac Studio.

AI researcher Awni Hannun stated that the new DeepSeek-V3 model can run on Apple computers equipped with the M3 Ultra chip at a speed of 20 tokens per second. This breaks the earlier consensus in the industry regarding the capabilities of AI models in relation to local running or conflicts, also indicating that a Datacenter is not a necessary pairing for large models.

Another AI researcher, Xeophon, claimed on X that after testing the new DeepSeek-V3 on an internal workbench, it showed a huge leap in all tested Indicators. It is now the best non-inference model, surpassing Oracle's Claude Sonnet 3.5.

Low-key but sensational.

The DeepSeek-V3-0324 was released without an accompanying white paper or any promotional material, only an empty ReadMe file. This nearly simplistic release format sharply contrasts with Silicon Valley's meticulously crafted product promotion models.

Meanwhile, all of DeepSeek's models are open-source, available for anyone to download and use for free, in stark contrast to one of the best commercial models, Claude Sonnet, which charges $20 per month.

In addition, DeepSeek has fundamentally reimagined the operation of large language models, activating only about 37 billion parameters during specific tasks instead of all, which is known as the "expert" module, greatly reducing computational demands.

The model also features two other groundbreaking technologies: multi-head latent attention (MLA) and multi-label prediction (MTP). MLA enhances the model's ability to maintain context in long texts, while MTP generates multiple labels at each step instead of the usual one label at a time. These innovations collectively improved output speed by nearly 80%.

To some extent, DeepSeek embodies the spirit of Chinese companies' extreme pursuit of efficiency and resources, i.e., how to achieve equal or even optimized performance with limited computational resources. This demand-driven innovation has allowed China's AI to astonish the Global community in a matter of months.

The changes in DeepSeek's new model also have significant implications for the Industry; on one hand, it greatly reduces the energy consumption and computational costs of large models, further shaking Wall Street's assumptions about the scale of investment in top model infrastructure; on the other hand, the widespread consensus on open-source within China's AI Industry has rapidly promoted the development of the domestic AI sector, continuously narrowing the gap with the world's top competitors.

Some believe that with DeepSeek's rapid catch-up, its R2 model, planned for release in April, may directly challenge OpenAI's long-promoted GPT-5 model. If this prospect truly materializes, then the differing approaches to AI development between China and the US may face direct confrontation.

Editor/lambor

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

Comment 10 · Views 54.8k

Comment(10)

Recommended

Write a comment

10 25

Discussing

北水爆買！中國資產能否延續漲勢？

3月17日早盤，地產代理、物業服務及管理等板塊漲幅明顯，貝殼-W早盤漲逾4%，碧桂園服務漲逾9%。政策消息面上，兩部門發文落實專項債支持收地，中房協組織民營房企座談會。中國資產本輪火爆行情還能持續多久？你會如何投資？ Show More

北水狂掃港股！近期如何操作？

71%

29%

看好！繼續加倉

我恐高，逢高減倉

16K votes

年頭旺到年尾

Feb 27 16:09

Review on February 27...

$Hang Seng Index (800000.HK)$ $HSI Futures Current Contract (HSIcurrent.HK)$ The day before yesterday's review mentioned that the estimated previous top of 23,700 was not the peak. Yesterday it immediately broke through, and the increase was unexpectedly close to 1,000 points, as the short-term trading underestimated the extent of the rise. Therefore, many positions were previously entered in a bearish way, but in the end, the bears exited with stop losses at the close.

Today, after hitting the high near 24,000 in the early session and entering bearish positions, the index fell sharply by nearly over 600 points, immediately recouping yesterday's losses significantly.

Moreover, today it broke the new high again, reaching a maximum of 24,076, but by the end of the market, it fell back by about 70 points, producing a bearish candle. The current trend has not yet been broken, but from the previous low until now, it has risen close to 6,000 points. It is believed that those with positions can continue to hold until there is a clear trend reversal for profit-taking. Those without positions can wait for a pullback to get in. Actually, it is hoped for a quick pullback, as it allows for entry and also provides a healthy breath.

Currently, the outlook remains the same as before. It is believed that even if there is a pullback, it shouldn't be too deep. However, if Futures fail to stabilize and close below 22,350, there may still be room for decline. The chance of Futures falling below 21,400 in the short term should be low, so it is considered that if a significant pullback occurs, it presents a good opportunity to incrementally go long. Recently, there has been a consistent approach to not hold positions overnight, only focusing on immediate trades, as there is no high chasing and no casual short selling.
Support and resistance can be referenced based on spot prices.
Support levels are 23150, 23250, 2...

DeepSeek message dynamic tracking

DeepSeek-V3低调发布后，业内震惊之余再次怀疑硅谷模式

After the low-key release of DeepSeek-V3, the industry was shocked and once again questioned the Silicon Valley model.

低调但轰动

Low-key but sensational.

Risk Disclaimer

Statement