The QwQ-32B large language model has only 32 billion parameters, yet it can rival the DeepSeek-R1 with 671 billion parameters (of which 37 billion are activated), and in some tests, it has even surpassed it. The breakthrough of QwQ-32B will further promote the shift of AI large models from the paradigm of "big miracles" to "fine intelligence," breaking the excessive pessimism some had after GPT-4.5 hit a wall.
$Alibaba (BABA.US)$ AI has made new moves again! The latest inference model QwQ-32B proves that small parameters can achieve big model level performance. As of the time of writing, $BABA-W (09988.HK)$ It surged over 6%.

On March 6, the Alibaba Tongyi Qianwen Qwen team launched the inference model - the QwQ-32B large language model. According to the official introduction, this model with only 32 billion parameters not only performs comparably to the DeepSeek-R1 model with 671 billion parameters (of which 37 billion are activated), but also surpasses it in certain tests.
The Alibaba Qwen team stated that this achievement highlights the effectiveness of applying reinforcement learning to powerful foundational models that have undergone large-scale pre-training, hoping to prove that combining powerful foundational models with large-scale reinforcement learning may be a feasible path toward general AI.
In addition to basic inference capabilities, QwQ-32B also integrates Agent-related capabilities, enabling it to engage in critical thinking while using tools and adjust the reasoning process based on environmental feedback.
Parameters simplified, performance unchanged, cost only 1/10.
According to official disclosed test results, QwQ-32B performed excellently in multiple key evaluations:
In the AIME24 test set assessing mathematical abilities, QwQ-32B performs comparably to DeepSeek-R1, far surpassing o1-mini and the same size R1 distilled model.
In the LiveCodeBench which evaluates code capabilities, performance is also comparable to DeepSeek-R1.
In the "Most Difficult LLMs Evaluation List" LiveBench led by Meta's chief scientist Yang Likun, QwQ-32B scores surpass DeepSeek-R1.
In the instruction-following capability test IFEval proposed by Google and others, the results are better than DeepSeek-R1.
In the BFCL test proposed by the University of California, Berkeley, which assesses the accurate calling of functions or tools, it also surpasses DeepSeek-R1.

Overseas netizens have demonstrated the performance of different reasoning models in LiveBench scoring, as well as their output token costs. The scoring of the QwQ 32B model is between R1 and o3-mini, but its cost is only one tenth of theirs. This indicates that QwQ 32B has achieved a good balance between performance and cost:
The LiveBench score of QwQ 32B is approximately 72.5 points, with a cost of about $0.25.
The score of R1 is approximately 70 points, with a cost of about $2.50.
The score of o3-mini is approximately 75 points, with a cost of about $5.00.

Reinforcement Learning: QwQ-32B's 'Secret Weapon'.
The outstanding performance of QwQ-32B is mainly attributed to the large-scale reinforcement learning methods it employs. The Alibaba team developed a phased reinforcement learning training strategy based on cold start.
Initial phase: Focused on RL training for mathematical and programming tasks. The team abandoned traditional reward models and adopted a more direct validation approach, providing feedback for mathematical questions by verifying the correctness of generated answers, and providing feedback on code by evaluating whether the generated code successfully passes test cases through a Server.
Expansion phase: Increased RL training for general capabilities. This phase used a general reward model and rule-based validators to help the model enhance other general capabilities while maintaining its mathematical and programming abilities.
Research shows that as the number of RL training iterations increases, the model exhibits a continuous improvement trend in both mathematical and programming fields, confirming the effectiveness of this approach.
QwQ-32B has been open-sourced, promoting the paradigm shift of large models from "great strength brings miracles" to "delicate skills bring wisdom."
Currently, QwQ-32B has been open-sourced on the Hugging Face and ModelScope platforms under the Apache 2.0 open-source license. At the same time, users can directly experience this powerful reasoning model through Qwen Chat.
In this regard, the technology self-media Digital Life Kask expressed:
The significance of this wave of QwQ-32B open-sourcing is still very strong.
It proves with strength that the RLHF route can still yield impressive results, breaking the excessive pessimism of some people after hitting a wall with GPT-4.5.
With medium scale yet achieving high-level performance, it injects strong confidence into the open-source community, showing that one does not need that kind of expensive equipment and ultra-large scale to compete on the same stage as international giants.
The release of QwQ-32B is highly consistent with Alibaba's recently announced AI Global Strategy. According to reports, Alibaba Group plans to invest over 380 billion yuan in the next three years to build Cloud and AI Hardware infrastructure, with the total investment exceeding the past decade's sum.
Previously, Alibaba's self-developed "Deep Thought" inference model has been launched on the Quark AI search platform, becoming one of the few large-scale C-end AI applications in the country that has not integrated with DeepSeek.
At the foundational model level, Alibaba's Tongyi large model family has entered the ranks of the world's top open-source models. Insider sources revealed that "larger scale models will also gradually integrate with Quark."
Editor/Somer