Probing to Refine: Reinforcement Distillation of LLMs via Explanatory Inversion
Zhen Tan, Chengshuai Zhao, Song Wang, Jundong Li, Tianlong Chen, Huan Liu

TL;DR
This paper introduces a novel distillation framework for large language models that enhances reasoning and generalization by using explanatory probes and reinforcement learning, resulting in significant performance improvements.
Contribution
The paper presents Explanatory Inversion and Explanatory GRPO, innovative methods that improve LLM distillation by fostering deeper understanding and coherent reasoning in smaller models.
Findings
20.39% average performance increase over zero-shot baseline
6.02% improvement over state-of-the-art distillation methods
Models trained with this method require less data and generalize better
Abstract
Distilling robust reasoning capabilities from large language models (LLMs) into smaller, computationally efficient student models remains an unresolved challenge. Despite recent advances, distilled models frequently suffer from superficial pattern memorization and subpar generalization. To overcome these limitations, we introduce a novel distillation framework that moves beyond simple mimicry to instill a deeper conceptual understanding. Our framework features two key innovations. \underline{\textit{First}}, to address pattern memorization, Explanatory Inversion (EI) generates targeted ``explanatory probes'' that compel the student to articulate the underlying logic behind an answer, rather than just memorizing it. \underline{\textit{Second}}, to improve generalization, Explanatory GRPO (\texttt{EXGRPO}) uses a reinforcement learning algorithm with a novel Dialogue Structure Utility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
