share_log

Is a new DeepSeek model coming?

wallstreetcn ·  Feb 11 19:38

DeepSeek is conducting a phased rollout test of its next-generation model. Some users received prompts to update the app upon opening it. The new version extends the context length from 128K to 1M and updates the knowledge base to May 2025. The official app indicates that this might be the final pre-release version before the formal debut of V4. A Nomura Securities report highlights that the core value of V4 lies in driving the commercial implementation of AI applications through foundational architectural innovations, rather than disrupting the existing AI value chain.

DeepSeek is advancing the gray-scale testing of its new version model, which may be the ultimate gray-scale version before the official launch of V4.

On February 11, some users received an update prompt after opening the DeepSeek app. After updating the app (to version 1.7.4), users can experience DeepSeek's latest model. Following this upgrade, the context length of the model will expand from 128K to 1M, nearly a tenfold increase; the knowledge base has been updated to May 2025, and several core capabilities have seen substantial improvements.

Author tests revealed that DeepSeek mentioned in Q&A that the current version is likely not V4 but rather the final evolutionary form of the V3 series or the ultimate gray-scale version before V4's official release.

Nomura Securities released a report on February 10 stating that the DeepSeek V4 model, expected to launch in mid-February 2026, will not recreate the global AI computing power demand panic caused by last year’s V3 release. The firm believes that the core value of V4 lies in promoting the commercialization of AI applications through fundamental architectural innovation, rather than disrupting the existing AI value chain.

According to evaluations, the new version's ability to handle complex tasks has aligned with mainstream closed-source models such as Gemini 3 Pro and K2.5. Nomura further noted that V4 is expected to introduce two innovative technologies, mHC and Engram, breaking through computational chip and memory bottlenecks at both algorithmic and engineering levels. Internal preliminary tests show that V4's performance in programming tasks has surpassed that of Anthropic Claude and OpenAI GPT series models of the same generation.

The key significance of this release lies in further compressing training and inference costs, providing a feasible pathway for global large language model and AI application companies to alleviate capital expenditure pressure.

Innovative architecture optimized for hardware bottlenecks

The Nomura Securities report pointed out that the performance constraints of computational chips and HBM memory have always been hard limitations that domestic large model industries cannot bypass. The mHC (Hyper-Connectivity and Manifold-Constrained Hyper-Connectivity) and Engram architectures introduced in the upcoming DeepSeek V4 are designed to systematically optimize these shortcomings from both training and inference dimensions.

mHC:

  • Its full name is 'Manifold Constrained Hyper-Connection.' It aims to address the bottleneck of information flow and training instability in Transformer models when they are extremely deep.

  • In simple terms, it enriches and makes the 'conversation' between neural network layers more flexible while preventing information from being amplified or destroyed through rigorous mathematical 'guardrails.' Experiments have shown that models using mHC perform better in tasks such as mathematical reasoning.

Engram:

  • A 'conditional memory' module. Its design concept is to decouple 'memory' from 'computation'.

  • Static knowledge in the model, such as entities and fixed expressions, is stored in a dedicated sparse memory table that can reside in low-cost DRAM. When reasoning is required, rapid lookup is performed. This frees up expensive GPU memory (HBM), allowing it to focus on dynamic computation.

mHC technology mitigates the generational gap of domestic chips in interconnection bandwidth and computational density by improving training stability and convergence efficiency; the Engram architecture focuses on reconstructing memory scheduling mechanisms, breaking through VRAM capacity and bandwidth limitations with more efficient access strategies amid constrained HBM supply. Nomura believes that these two innovations together form an adaptation solution tailored to the domestic hardware ecosystem, demonstrating clear engineering implementation value.

The report further noted that the most direct commercial impact of V4's release is the substantial reduction in training and inference costs. Optimization on the cost side will effectively stimulate downstream application demand, thereby driving a new cycle of AI infrastructure construction. In this process, Chinese AI hardware manufacturers are expected to benefit from increased demand and front-loaded investments.

The market structure has shifted from "one dominant player" to "multiple contenders vying for power."

Nomura's report reviewed the changes in the market landscape one year after the release of DeepSeek-V3/R1. By the end of 2024, two models from DeepSeek accounted for more than half of the token usage of open-source models on OpenRouter.

However, by the second half of 2025, with more players entering the market, its market share had significantly declined. The market transitioned from being "dominated by one" to a situation of "multiple contenders." The competitive environment faced by V4 is far more complex than it was a year ago. The combination of DeepSeek's "computational efficiency management" and "performance enhancement" has accelerated the development of large language models and applications in China, also reshaping the global competitive landscape and increasing attention on open-source models.

Software companies are presented with opportunities for value enhancement.

Nomura believes that major global cloud service providers are fully committed to pursuing artificial general intelligence (AGI), and the race for capital expenditure is far from over. Therefore, V4 is not expected to create the same level of impact on the global AI infrastructure market as it did last year.

However, global developers of large models and their applications are facing an increasingly heavy burden of capital expenditures. If V4 can significantly reduce training and inference costs while maintaining high performance, it will help these enterprises convert technology into revenue more quickly, alleviating profitability pressures.

On the application side, the more powerful and efficient V4 will give rise to stronger AI agents. The report observed that applications like AliCloud’s Tongyi Qianwen App are already capable of executing multi-step tasks in a more automated manner. AI agents are transitioning from being "dialogue tools" to becoming "AI assistants" capable of handling complex tasks.

These multi-tasking agents will need to interact more frequently with underlying large models, consuming more tokens and thereby driving up computational demand. Thus, improvements in model efficiency will not "kill software," but instead create value for leading software companies. Nomura emphasized the importance of focusing on those software companies that can be the first to leverage the capabilities of next-generation large models to build disruptive AI-native applications or agents. Their growth potential may be further elevated due to leaps in model capabilities.

Editor/Jayden

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment