Residual Knowledge Distillation
Mengya Gao, Yujun Shen, Quanquan Li, Chen Change Loy

TL;DR
Residual Knowledge Distillation introduces an assistant model to learn residual errors, enhancing the transfer of knowledge from a teacher to a student model, leading to improved performance in model compression tasks.
Contribution
The paper proposes a novel Residual Knowledge Distillation method that incorporates an assistant to better transfer knowledge, overcoming capacity gaps between teacher and student models.
Findings
Achieves superior results on CIFAR-100 and ImageNet datasets.
Outperforms existing state-of-the-art knowledge distillation methods.
Effectively reduces performance degradation in model compression.
Abstract
Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance degradation due to the substantial gap between the learning capacities of S and T. To remedy this problem, this work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A). Specifically, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them. In this way, S and A complement with each other to get better knowledge from T. Furthermore, we devise an effective method to derive S and A from a given model without increasing the total computational cost. Extensive experiments show that our approach achieves appealing results on popular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image Processing Techniques · COVID-19 diagnosis using AI
MethodsKnowledge Distillation
