TL;DR
CompRess introduces a self-supervised model compression technique that effectively transfers knowledge from large models to smaller ones, outperforming previous methods and even surpassing supervised models on ImageNet classification.
Contribution
This work presents a novel self-supervised model compression approach that improves performance of smaller models by mimicking the teacher's embedding space, outperforming prior methods.
Findings
Outperforms all previous methods on ImageNet for AlexNet.
First self-supervised AlexNet to outperform supervised counterpart on ImageNet.
Achieves 59.0% linear evaluation accuracy, surpassing 56.5%.
Abstract
Self-supervised learning aims to learn good representations with unlabeled data. Recent works have shown that larger models benefit more from self-supervised learning than smaller models. As a result, the gap between supervised and self-supervised learning has been greatly reduced for larger models. In this work, instead of designing a new pseudo task for self-supervised learning, we develop a model compression method to compress an already learned, deep self-supervised model (teacher) to a smaller one (student). We train the student model so that it mimics the relative similarity between the data points in the teacher's embedding space. For AlexNet, our method outperforms all previous methods including the fully supervised model on ImageNet linear evaluation (59.0% compared to 56.5%) and on nearest neighbor evaluation (50.7% compared to 41.4%). To the best of our knowledge, this is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
