RLKD: Distilling LLMs' Reasoning via Reinforcement Learning

Shicheng Xu; Liang Pang; Yunchang Zhu; Jia Gu; Zihao Wei; Jingcheng Deng; Feiyang Pan; Huawei Shen; Xueqi Cheng

arXiv:2505.16142·cs.CL·February 3, 2026

RLKD: Distilling LLMs' Reasoning via Reinforcement Learning

Shicheng Xu, Liang Pang, Yunchang Zhu, Jia Gu, Zihao Wei, Jingcheng Deng, Feiyang Pan, Huawei Shen, Xueqi Cheng

PDF

Open Access

TL;DR

This paper introduces RLKD, a reinforcement learning framework that distills authentic multi-branch reasoning structures from teacher LLMs into student models, surpassing traditional supervised fine-tuning methods.

Contribution

RLKD employs a novel Generative Structure Reward Model to align reasoning structures, enabling effective distillation of complex reasoning paths via reinforcement learning.

Findings

01

RLKD outperforms standard SFT-RL pipelines.

02

Effective reasoning structure distillation with only 0.1% data.

03

Student models achieve greater reasoning potential.

Abstract

Distilling reasoning paths from teacher to student models via supervised fine-tuning (SFT) provides a shortcut for improving the reasoning ability of smaller Large Language Models (LLMs). However, the reasoning paths generated by teacher models often reflect only surface-level traces of their underlying authentic reasoning. Insights from cognitive neuroscience suggest that authentic reasoning involves a complex interweaving between meta-reasoning (which selects appropriate sub-problems from multiple candidates) and solving (which addresses the sub-problem). This implies authentic reasoning has an implicit multi-branch structure. Supervised fine-tuning collapses this rich structure into a flat sequence of token prediction in the teacher's reasoning path, preventing effective distillation of this structure to students. To address this limitation, we propose RLKD, a reinforcement learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Multimodal Machine Learning Applications