TL;DR
This paper introduces a hierarchical self-supervised augmented knowledge distillation method that enhances student network performance by transferring diverse knowledge from intermediate features, outperforming previous state-of-the-art methods.
Contribution
It proposes appending auxiliary classifiers to intermediate layers for diverse knowledge transfer and combines original and auxiliary tasks for improved representation learning.
Findings
Achieves 2.56% improvement on CIFAR-100
Achieves 0.77% improvement on ImageNet
Surpasses previous SOTA SSKD methods
Abstract
Knowledge distillation often involves how to define and transfer knowledge from teacher to student effectively. Although recent self-supervised contrastive knowledge achieves the best performance, forcing the network to learn such knowledge may damage the representation learning of the original class recognition task. We therefore adopt an alternative self-supervised augmented task to guide the network to learn the joint distribution of the original recognition task and self-supervised auxiliary task. It is demonstrated as a richer knowledge to improve the representation power without losing the normal classification capability. Moreover, it is incomplete that previous methods only transfer the probabilistic knowledge between the final layers. We propose to append several auxiliary classifiers to hierarchical intermediate feature maps to generate diverse self-supervised knowledge and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
