Simple Unsupervised Knowledge Distillation With Space Similarity
Aditya Singh, Haohan Wang

TL;DR
This paper introduces a simple unsupervised knowledge distillation method that encourages a student network to model the teacher's embedding manifold using space similarity, improving preservation of the teacher's latent space.
Contribution
It proposes a novel space similarity loss that captures the teacher's embedding manifold more effectively than prior methods relying solely on normalized features.
Findings
Outperforms existing UKD methods on multiple benchmarks.
Effectively preserves the teacher's latent manifold.
Enhances student network performance without labeled data.
Abstract
As per recent studies, Self-supervised learning (SSL) does not readily extend to smaller architectures. One direction to mitigate this shortcoming while simultaneously training a smaller network without labels is to adopt unsupervised knowledge distillation (UKD). Existing UKD approaches handcraft preservation worthy inter/intra sample relationships between the teacher and its student. However, this may overlook/ignore other key relationships present in the mapping of a teacher. In this paper, instead of heuristically constructing preservation worthy relationships between samples, we directly motivate the student to model the teacher's embedding manifold. If the mapped manifold is similar, all inter/intra sample relationships are indirectly conserved. We first demonstrate that prior methods cannot preserve teacher's latent manifold due to their sole reliance on normalised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsKnowledge Distillation
