Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenda Xu, Rujun Han, Zifeng Wang, Long T. Le, Dhruv Madeka, Lei Li,, William Yang Wang, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister

TL;DR
This paper introduces Speculative Knowledge Distillation (SKD), a novel method that improves the teacher-student training process by dynamically generating high-quality training data, leading to better performance in text generation tasks.
Contribution
SKD enables adaptive, on-the-fly data generation by combining student proposals with teacher feedback, effectively bridging the knowledge gap in traditional KD methods.
Findings
SKD outperforms existing KD methods across multiple text generation tasks.
SKD maintains high performance across different data sizes and model initializations.
SKD effectively aligns training data with inference-time distributions.
Abstract
Recent advances in knowledge distillation (KD) have enabled smaller student models to approach the performance of larger teacher models. However, popular methods such as supervised KD and on-policy KD, are adversely impacted by the knowledge gaps between teacher-student in practical scenarios. Supervised KD suffers from a distribution mismatch between training with a static dataset and inference over final student-generated outputs. Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples with which teacher models are not familiar, resulting in inaccurate teacher feedback. To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline and Blended Learning · Education and Critical Thinking Development
MethodsKnowledge Distillation
