Improved knowledge distillation by utilizing backward pass knowledge in   neural networks

Aref Jafari; Mehdi Rezagholizadeh; Ali Ghodsi

arXiv:2301.12006·cs.LG·January 31, 2023

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi

PDF

Open Access

TL;DR

This paper introduces a novel knowledge distillation method that leverages backward pass information to generate auxiliary training samples, improving model compression and performance in vision and NLP tasks.

Contribution

The work proposes utilizing backward pass knowledge to generate auxiliary samples, enhancing the effectiveness of knowledge distillation beyond traditional forward pass methods.

Findings

01

Significant performance improvement in KD with auxiliary data augmentation

02

Effective application in both computer vision and NLP tasks

03

Closer match between teacher and student models achieved

Abstract

Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to better-match the output of the student model to that of the teacher model based on the knowledge extracts from the forward pass of the teacher network. Although conventional KD is effective for matching the two networks over the given data points, there is no guarantee that these models would match in other areas for which we do not have enough training samples. In this work, we address that problem by generating new auxiliary training samples based on extracting knowledge from the backward pass of the teacher in the areas where the student diverges greatly from the teacher. We compute the difference between the teacher and the student and generate new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning