SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation
K L Navaneet, Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed, Pirsiavash

TL;DR
SimReg introduces a simple yet effective regression-based approach for self-supervised knowledge distillation, leveraging architectural modifications like multi-layer perceptron heads and shared weak augmentations to outperform complex methods on ImageNet.
Contribution
The paper proposes a novel regression-based distillation method with simple architectural changes that improve performance over existing state-of-the-art techniques.
Findings
Regression with added MLP heads outperforms complex distillation methods.
Shared weak augmentation input benefits the distillation process.
Method achieves superior results on ImageNet dataset.
Abstract
Feature regression is a simple way to distill large neural network models to smaller ones. We show that with simple changes to the network architecture, regression can outperform more complex state-of-the-art approaches for knowledge distillation from self-supervised models. Surprisingly, the addition of a multi-layer perceptron head to the CNN backbone is beneficial even if used only during distillation and discarded in the downstream task. Deeper non-linear projections can thus be used to accurately mimic the teacher without changing inference architecture and time. Moreover, we utilize independent projection heads to simultaneously distill multiple teacher networks. We also find that using the same weakly augmented image as input for both teacher and student networks aids distillation. Experiments on ImageNet dataset demonstrate the efficacy of the proposed changes in various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation
