ResKD: Residual-Guided Knowledge Distillation
Xuewei Li, Songyuan Li, Bourahla Omar, Fei Wu, and Xi Li

TL;DR
ResKD introduces a residual-guided knowledge distillation method that iteratively refines a lightweight student network by leveraging the residuals from a teacher, achieving significant computational savings while maintaining competitive accuracy.
Contribution
The paper proposes a novel residual-guided knowledge distillation framework that iteratively improves student performance and includes a sample-adaptive strategy for computational efficiency.
Findings
Achieves up to 56.86% of teacher’s computational cost on ImageNet.
Maintains competitive accuracy across multiple datasets.
Provides theoretical and empirical validation of the method.
Abstract
Knowledge distillation, aimed at transferring the knowledge from a heavy teacher network to a lightweight student network, has emerged as a promising technique for compressing neural networks. However, due to the capacity gap between the heavy teacher and the lightweight student, there still exists a significant performance gap between them. In this paper, we see knowledge distillation in a fresh light, using the knowledge gap, or the residual, between a teacher and a student as guidance to train a much more lightweight student, called a res-student. We combine the student and the res-student into a new student, where the res-student rectifies the errors of the former student. Such a residual-guided process can be repeated until the user strikes the balance between accuracy and cost. At inference time, we propose a sample-adaptive strategy to decide which res-students are not necessary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
