RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md Akmal Haidar, Nithin Anchuri, Mehdi Rezagholizadeh, Abbas Ghaddar,, Philippe Langlais, Pascal Poupart

TL;DR
RAIL-KD introduces a randomized layer selection method for knowledge distillation that reduces computational costs and acts as a regularizer, improving student model performance on NLP tasks.
Contribution
The paper proposes RAIL-KD, a novel randomized layer selection approach for intermediate layer knowledge distillation that enhances efficiency and generalization.
Findings
RAIL-KD outperforms existing methods on GLUE tasks.
It reduces training time compared to traditional intermediate layer KD.
Acts as a regularizer improving model generalization.
Abstract
Intermediate layer knowledge distillation (KD) can improve the standard KD technique (which only targets the output of teacher and student models) especially over large pre-trained language models. However, intermediate layer distillation suffers from excessive computational burdens and engineering efforts required for setting up a proper layer mapping. To address these problems, we propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model. This randomized selection enforce that: all teacher layers are taken into account in the training process, while reducing the computational cost of intermediate layer distillation. Also, we show that it act as a regularizer for improving the generalizability of the student model. We perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsTest · Knowledge Distillation
