Just when everyone was expecting the official announcement of the R2 large model by DeepSeek, the company unexpectedly dropped another technical bomb just before the May Day holiday.
On April 30, DeepSeek quietly open-sourced its latest model on the Hugging Face platform – DeepSeek-Prover-V2-671B, a large language model focused on mathematical theorem proving, optimized specifically for formal mathematical proof tasks.

DeepSeek-Prover-V2-671B uses the DeepSeek-V3 architecture, with parameters up to 671 billion, adopting the MoE (Mixture of Experts) mode, featuring 61 Transformer layers and a hidden layer of 7168 dimensions.

The model uses a more efficient safetensors file format, supporting various computation precisions such as BF16, FP8, and F32, allowing the model to be trained and deployed faster and more resource-efficiently. According to analysis from OSCHINA, the model is divided into 163 shards, each approximately 4.3GB in size, fully demonstrating its immense scale.
Notably, the V3-0324 version released in March has been regarded by the industry as the foundational model for the future R2. This version has been upgraded through a MoE architecture with 685 billion parameters, significantly enhancing its coding capabilities.
A major upgrade for math AI: DeepSeek has open-sourced a 671 billion parameter model.
From a technical perspective, DeepSeek-Prover-V2-671B utilizes the DeepSeek-V3 architecture, adopts the MoE (Mixture of Experts) mode, has 61 Transformer layers, and a hidden layer of 7168 dimensions. Even more astonishing is its maximum positional embedding of 0.1638 million, which means it can handle extremely complex mathematical proof problems.
DeepSeek-Prover is a series of open-source large language models developed by the DeepSeek team, focused on mathematical theorem proving. It has made significant progress in formal theorem proving by training on large-scale synthetic data and incorporating optimization techniques such as reinforcement learning (RL) and Monte Carlo tree search (MCTS).
DeepSeek-Prover has set a new benchmark in the field of formal mathematical proofs through innovative training frameworks and efficient reasoning strategies. Its technical path, which combines synthetic data, reinforcement learning, and tree search, not only enhances model performance but also opens new directions for the application of AI in rigorous mathematical reasoning.
According to analysis by OSCHINA, the new model has the following characteristics:
Gigantic model scale: the number of parameters is about 671B (671 billion parameters), which is evident from the number of model shards (163) and the size of each shard (about 4.3GB).
Utilizes the architecture of DeepSeek-V3: employing MoE (Mixture of Experts) mode, it has 61 layers of Transformer layers and a 7168-dimensional hidden layer.
Optimized specifically for mathematical theorem proving: as indicated by the name 'Prover', this is a specialized model focusing on mathematical reasoning and theorem proving.
Supports extremely long context: the maximum positional embedding reaches 163840, allowing it to handle complex mathematical proofs.
Employs FP8 Algo: reduces model size through quantization technology, enhancing reasoning efficiency.
Is DeepSeekR2 still far away?
DeepSeek's founder, Liang Wenfeng, once stated: "China should gradually become an innovator, rather than just hitching a ride." He sees exploring the nature of general AI as a core mission.
According to a report by "China Entrepreneur," the AI team led by DeepSeek's founder Liang Wenfeng has maintained a pace of product iteration that keeps in sync with international giants — releasing version 2.5 in September 2024, launching V3 infrastructure in December, and upgrading to V3-0324 in March of the following year, establishing a development paradigm of significant updates every quarter.
Notably, the V3-0324 version released in March has been regarded within the industry as the foundational model for the future R2. This version features an upgrade via an MoE architecture with 685 billion parameters, significantly enhancing its coding capabilities.
Such a robust and efficient pace of product iteration inevitably leads one to wonder: how far off is the long-rumored DeepSeek R2 large model? Is this new mathematical model merely a prelude to DeepSeek's upcoming release of a larger-scale general model?
On the social platform X, some users have stated:
"R2 is just around the corner..."

"Yesterday it was Qwen 3, today it’s DeepSeek Prover V2..."

Moreover, some evaluations indicate that in practical performance, "the success rate in high school math test questions has increased from 50% to 63.5%."

V1.5 is still just a small model with only 7 billion parameters, this time it will directly upgrade to a large model.

China's AI startups never disappoint! They are really changing the entire game.

China has recently been applying some martial arts to AI.
This is exciting news!

Editor/rice