Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series
Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang

TL;DR
This paper introduces four new distilled reasoning and reward model series derived from Qwen, designed for industrial use, balancing high reasoning accuracy with efficiency, and validated through extensive benchmarks and practical deployment.
Contribution
The paper extends the DistilQwen family with four specialized models, including adaptive and reward models, optimized for industrial reasoning tasks and efficiency.
Findings
High inference efficiency demonstrated across benchmarks.
Models achieve strong reasoning performance.
Practical deployment on Alibaba Cloud PAI platform.
Abstract
Recently, the demand for small and efficient reasoning models to support real-world applications has driven the development of knowledge distillation techniques that balance reasoning performance and inference speed. In this paper, we further extend the DistilQwen model family, initialized from the Qwen models, by introducing four model series specifically designed to meet industrial requirements. The distilled model collection comprises: (1) slow-thinking models, optimized for reasoning tasks that require high accuracy; (2) two series of adaptive-thinking models, which dynamically adjust reasoning strategies based on input tasks to maximize efficiency across diverse scenarios; and (3) distilled reward models, which enable further reinforcement learning of reasoning models using distilled knowledge. Comprehensive evaluations across multiple benchmarks demonstrate both high inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning
