On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals
Haizhou Shi, Youcai Zhang, Siliang Tang, Wenjie Zhu, Yaqian Li,, Yandong Guo, Yueting Zhuang

TL;DR
This paper demonstrates that small self-supervised contrastive models can be effectively trained without distillation signals, challenging the belief that large models are necessary for good performance in this setting.
Contribution
It provides a comprehensive evaluation of small models' representation spaces and introduces techniques to improve their performance without relying on distillation.
Findings
Small models can complete pretext tasks without overfitting.
Small models suffer from over-clustering in representation space.
Validated techniques significantly improve small model performance.
Abstract
It is a consensus that small models perform quite poorly under the paradigm of self-supervised contrastive learning. Existing methods usually adopt a large off-the-shelf model to transfer knowledge to the small one via distillation. Despite their effectiveness, distillation-based methods may not be suitable for some resource-restricted scenarios due to the huge computational expenses of deploying a large model. In this paper, we study the issue of training self-supervised small models without distillation signals. We first evaluate the representation spaces of the small models and make two non-negligible observations: (i) the small models can complete the pretext task without overfitting despite their limited capacity and (ii) they universally suffer the problem of over clustering. Then we verify multiple assumptions that are considered to alleviate the over-clustering phenomenon.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
