Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
Bowen Yu, Maolin Wang, Sheng Zhang, Binhao Wang, Yi Wen, Jingtong Gao, Bowen Liu, Zimo Zhao, Wanyu Wang, Xiangyu Zhao

TL;DR
This paper introduces a three-stage curriculum learning framework for distilling large language models' reasoning into smaller models, improving accuracy and reducing output length through structure-aware masking and GRPO optimization.
Contribution
It proposes a novel curriculum learning approach with structure-aware masking and GRPO to enhance chain-of-thought distillation for smaller models.
Findings
Qwen2.5-3B-Base achieves 11.29% accuracy improvement on GSM8K.
Output length is reduced by 27.4%.
Outperforms prior distillation methods and instruction-tuned variants.
Abstract
Distilling Chain-of-Thought (CoT) reasoning from large language models into compact student models presents a fundamental challenge: teacher rationales are often too verbose for smaller models to faithfully reproduce. Existing approaches either compress reasoning into single-step, losing the interpretability that makes CoT valuable. We present a three-stage curriculum learning framework that addresses this capacity mismatch through progressive skill acquisition. First, we establish structural understanding via masked shuffled reconstruction. Second, we apply Group Relative Policy Optimization (GRPO) on masked completion tasks, enabling the model to discover its own balance between accuracy and brevity. Third, we identify persistent failure cases and guide the student to internalize teacher knowledge through targeted rewriting, again optimized with GRPO. Experiments on GSM8K demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Multimodal Machine Learning Applications
