Curriculum Learning-Guided Progressive Distillation in Large Language Models

Jincheng Cao; Fanzhi Zeng; Leqi Liu; Aryan Mokhtari

arXiv:2605.11260·cs.LG·May 13, 2026

Curriculum Learning-Guided Progressive Distillation in Large Language Models

Jincheng Cao, Fanzhi Zeng, Leqi Liu, Aryan Mokhtari

PDF

TL;DR

This paper introduces CLPD, a framework that improves knowledge distillation in large language models by jointly optimizing data difficulty and teacher capacity scheduling, leading to better student performance.

Contribution

It proposes a unified curriculum learning approach that explicitly aligns data difficulty with teacher capacity during distillation, enhancing reasoning abilities in small models.

Findings

01

CLPD outperforms standard distillation methods on reasoning benchmarks.

02

Joint data and teacher curriculum improves student model capabilities.

03

Framework is modular and easily integrable with existing distillation algorithms.

Abstract

Knowledge distillation is a key technique for transferring the capabilities of large language models (LLMs) into smaller, more efficient student models. Existing distillation approaches often overlook two critical factors: the learning order of training data and the capacity mismatch between teacher and student models. This oversight limits distillation performance, as manifested by the counter-intuitive phenomenon where stronger teachers fail to produce better students. In this work, we propose Curriculum Learning-Guided Progressive Distillation (CLPD), a unified framework that explicitly accounts for both factors by aligning data difficulty with teacher strength. CLPD constructs an explicit curriculum by organizing training examples from easy to hard, while simultaneously applying an implicit curriculum over supervision signals by progressively scheduling teachers of increasing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.