ResKD: Residual-Guided Knowledge Distillation

Xuewei Li; Songyuan Li; Bourahla Omar; Fei Wu; and Xi Li

arXiv:2006.04719·cs.CV·December 1, 2021

ResKD: Residual-Guided Knowledge Distillation

Xuewei Li, Songyuan Li, Bourahla Omar, Fei Wu, and Xi Li

PDF

TL;DR

ResKD introduces a residual-guided knowledge distillation method that iteratively refines a lightweight student network by leveraging the residuals from a teacher, achieving significant computational savings while maintaining competitive accuracy.

Contribution

The paper proposes a novel residual-guided knowledge distillation framework that iteratively improves student performance and includes a sample-adaptive strategy for computational efficiency.

Findings

01

Achieves up to 56.86% of teacher’s computational cost on ImageNet.

02

Maintains competitive accuracy across multiple datasets.

03

Provides theoretical and empirical validation of the method.

Abstract

Knowledge distillation, aimed at transferring the knowledge from a heavy teacher network to a lightweight student network, has emerged as a promising technique for compressing neural networks. However, due to the capacity gap between the heavy teacher and the lightweight student, there still exists a significant performance gap between them. In this paper, we see knowledge distillation in a fresh light, using the knowledge gap, or the residual, between a teacher and a student as guidance to train a much more lightweight student, called a res-student. We combine the student and the res-student into a new student, where the res-student rectifies the errors of the former student. Such a residual-guided process can be repeated until the user strikes the balance between accuracy and cost. At inference time, we propose a sample-adaptive strategy to decide which res-students are not necessary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation