Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series

Wenrui Cai; Chengyu Wang; Junbing Yan; Jun Huang; Xiangzhong Fang

arXiv:2511.01354·cs.CL·November 4, 2025

Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series

Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang

PDF

Open Access 3 Models 1 Video

TL;DR

This paper introduces four new distilled reasoning and reward model series derived from Qwen, designed for industrial use, balancing high reasoning accuracy with efficiency, and validated through extensive benchmarks and practical deployment.

Contribution

The paper extends the DistilQwen family with four specialized models, including adaptive and reward models, optimized for industrial reasoning tasks and efficiency.

Findings

01

High inference efficiency demonstrated across benchmarks.

02

Models achieve strong reasoning performance.

03

Practical deployment on Alibaba Cloud PAI platform.

Abstract

Recently, the demand for small and efficient reasoning models to support real-world applications has driven the development of knowledge distillation techniques that balance reasoning performance and inference speed. In this paper, we further extend the DistilQwen model family, initialized from the Qwen models, by introducing four model series specifically designed to meet industrial requirements. The distilled model collection comprises: (1) slow-thinking models, optimized for reasoning tasks that require high accuracy; (2) two series of adaptive-thinking models, which dynamically adjust reasoning strategies based on input tasks to maximize efficiency across diverse scenarios; and (3) distilled reward models, which enable further reinforcement learning of reasoning models using distilled knowledge. Comprehensive evaluations across multiple benchmarks demonstrate both high inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning