MiniDisc: Minimal Distillation Schedule for Language Model Compression
Chen Zhang, Yang Yang, Qifan Wang, Jiahao Liu, Jingang Wang, Wei Wu,, Dawei Song

TL;DR
MiniDisc introduces a minimal, trial-efficient method for scheduling the optimal teacher assistant in language model distillation, improving efficiency and scalability over existing methods.
Contribution
MiniDisc proposes a novel $mbda$-tradeoff framework for selecting the best teacher assistant in a single trial, reducing the need for extensive trial-and-error.
Findings
Outperforms state-of-the-art baselines in efficiency on GLUE tasks.
Scalable to billion-parameter language models.
Demonstrates effectiveness with minimal trials.
Abstract
Recent studies have uncovered that language model distillation is less effective when facing a large capacity gap between the teacher and the student, and introduced teacher assistant-based distillation to bridge the gap. As a connection, the scale and the performance of the teacher assistant is of vital importance to bring the knowledge from the teacher to the student. However, existing teacher assistant-based methods require maximally many trials before scheduling an optimal teacher assistant. To this end, we propose a minimal distillation schedule (MiniDisc) for scheduling the optimal teacher assistant in minimally one trial. In particular, motivated by the finding that the performance of the student is positively correlated to the scale-performance tradeoff of the teacher assistant, MiniDisc is designed with a -tradeoff to measure the optimality of the teacher assistant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Ferroelectric and Negative Capacitance Devices
MethodsKnowledge Distillation
