Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework

Lingyuan Liu; Mengxiang Zhang

arXiv:2506.05695·cs.CL·June 11, 2025

Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework

Lingyuan Liu, Mengxiang Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a curriculum learning framework called POCL that improves knowledge distillation of large language models by progressively increasing training sample difficulty, leading to more stable and efficient model compression.

Contribution

The paper proposes a novel plug-in curriculum learning framework for KD that enhances stability and performance by gradually increasing sample difficulty during training.

Findings

01

POCL improves distillation performance across various methods and models.

02

Structured training data enhances stability and efficiency in KD.

03

The framework is easy to integrate with minimal computational overhead.

Abstract

Knowledge Distillation (KD) compresses large language models (LLMs) by transferring the teacher model's capabilities to a smaller student model, reducing inference cost and memory usage while maintaining performance. However, existing KD methods for LLMs often fail to prevent significant shifts in the student model's distribution during training, leading to issues such as catastrophic forgetting, mode collapse, and training-inference mismatch. To address these challenges, we propose a novel, plug-in curriculum learning framework inspired by the strength training principle of "progressive overload" (POCL), which can be seamlessly integrated into existing white-box KD approaches with minimal computational overhead. The framework comprises two core components: (1) a difficulty measurer that ranks and partitions training samples from easy to hard, and (2) a training scheduler that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuliuyuan6/POCL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning