share_log

DeepSeek released the Prover-V2 model, with a parameter count reaching 671 billion.

Breakings ·  Apr 30 18:36

Today, DeepSeek released a new model named DeepSeek-Prover-V2-671B on the AI open-source Community Hugging Face. It is reported that DeepSeek-Prover-V2-671B uses a more efficient safetensors file format and supports multiple computing precisions, making model training and deployment faster and more resource-efficient, with parameters reaching 671 billion, or an upgraded version of last year's Prover-V1.5 mathematical model. In terms of model architecture, this model utilizes the DeepSeek-V3 architecture, adopting the MoE (Mixture of Experts) mode, featuring 61 layers of Transformer layers and a hidden layer of 7168 dimensions. It also supports ultra-long context, with a maximum positional embedding of 163840, allowing it to handle complex mathematical proofs, and it employs FP8 Algo to reduce model size and improve inference efficiency. (Sina Technology)

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment