$\mathcal{X}$-KD: General Experiential Knowledge Distillation for Large Language Models
Yuang Cai, Yuyu Yuan

TL;DR
$\ ext{\textbf{X}}$-KD introduces a novel framework for knowledge distillation that incorporates the teacher's original learning environment, leading to improved performance and efficiency in large language models across multiple tasks.
Contribution
The paper proposes $\ ext{\textbf{X}}$-KD, a flexible distillation method that models the teacher's reward environment, enhancing learning effectiveness for large language models.
Findings
Outperforms baseline KD methods on summarization, translation, and reasoning tasks.
Achieves better performance-diversity trade-off.
Demonstrates improved data efficiency over existing approaches.
Abstract
Knowledge Distillation (KD) for Large Language Models (LLMs) has become increasingly important as models grow in size and complexity. While existing distillation approaches focus on imitating teacher behavior, they often overlook the original learning environment that shaped the teacher's knowledge. Inspired by the experiential learning theory and inverse reinforcement learning, we propose Experiential Knowledge Distillation (-KD), a novel and general framework that enables student models to learn in the teacher's original learning environment. -KD adopts the Approximated Variational Reward Imitation Learning (AVRIL) framework to jointly model the teacher's original reward function and perform policy distillation, encouraging consistency between the student policy and the original reward function. Our derivation demonstrates that -KD follows the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Multimodal Machine Learning Applications
