Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework
Lingyuan Liu, Mengxiang Zhang

TL;DR
This paper introduces a curriculum learning framework called POCL that improves knowledge distillation of large language models by progressively increasing training sample difficulty, leading to more stable and efficient model compression.
Contribution
The paper proposes a novel plug-in curriculum learning framework for KD that enhances stability and performance by gradually increasing sample difficulty during training.
Findings
POCL improves distillation performance across various methods and models.
Structured training data enhances stability and efficiency in KD.
The framework is easy to integrate with minimal computational overhead.
Abstract
Knowledge Distillation (KD) compresses large language models (LLMs) by transferring the teacher model's capabilities to a smaller student model, reducing inference cost and memory usage while maintaining performance. However, existing KD methods for LLMs often fail to prevent significant shifts in the student model's distribution during training, leading to issues such as catastrophic forgetting, mode collapse, and training-inference mismatch. To address these challenges, we propose a novel, plug-in curriculum learning framework inspired by the strength training principle of "progressive overload" (POCL), which can be seamlessly integrated into existing white-box KD approaches with minimal computational overhead. The framework comprises two core components: (1) a difficulty measurer that ranks and partitions training samples from easy to hard, and (2) a training scheduler that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
