Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation
Yuhang Zhou, Jing Zhu, Paiheng Xu, Xiaoyu Liu, Xiyao Wang, Danai, Koutra, Wei Ai, Furong Huang

TL;DR
This paper introduces Multi-Stage Balanced Distillation, a novel framework that improves sequence-level knowledge distillation for long-tailed data distributions, enhancing model generalization and performance.
Contribution
The paper proposes a multi-stage balancing approach that dynamically selects and synthesizes training examples to address long-tail challenges in sequence-level knowledge distillation.
Findings
Achieves state-of-the-art results on long-tailed datasets.
Improves generalization on sparsely represented domains.
Enhances efficiency of knowledge distillation process.
Abstract
Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of merely final outcomes, shows great potential in enhancing students' reasoning capabilities. However, current methods struggle with sequence level KD under long-tailed data distributions, adversely affecting generalization on sparsely represented domains. We introduce the Multi-Stage Balanced Distillation (BalDistill) framework, which iteratively balances training data within a fixed computational budget. By dynamically selecting representative head domain examples and synthesizing tail domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsProcess Optimization and Integration · Machine Learning and Algorithms · Reservoir Engineering and Simulation Methods
MethodsKnowledge Distillation
