Improved knowledge distillation by utilizing backward pass knowledge in neural networks
Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi

TL;DR
This paper introduces a novel knowledge distillation method that leverages backward pass information to generate auxiliary training samples, improving model compression and performance in vision and NLP tasks.
Contribution
The work proposes utilizing backward pass knowledge to generate auxiliary samples, enhancing the effectiveness of knowledge distillation beyond traditional forward pass methods.
Findings
Significant performance improvement in KD with auxiliary data augmentation
Effective application in both computer vision and NLP tasks
Closer match between teacher and student models achieved
Abstract
Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to better-match the output of the student model to that of the teacher model based on the knowledge extracts from the forward pass of the teacher network. Although conventional KD is effective for matching the two networks over the given data points, there is no guarantee that these models would match in other areas for which we do not have enough training samples. In this work, we address that problem by generating new auxiliary training samples based on extracting knowledge from the backward pass of the teacher in the areas where the student diverges greatly from the teacher. We compute the difference between the teacher and the student and generate new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
