On the Efficacy of Small Self-Supervised Contrastive Models without   Distillation Signals

Haizhou Shi; Youcai Zhang; Siliang Tang; Wenjie Zhu; Yaqian Li,; Yandong Guo; Yueting Zhuang

arXiv:2107.14762·cs.LG·December 14, 2021

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Haizhou Shi, Youcai Zhang, Siliang Tang, Wenjie Zhu, Yaqian Li,, Yandong Guo, Yueting Zhuang

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that small self-supervised contrastive models can be effectively trained without distillation signals, challenging the belief that large models are necessary for good performance in this setting.

Contribution

It provides a comprehensive evaluation of small models' representation spaces and introduces techniques to improve their performance without relying on distillation.

Findings

01

Small models can complete pretext tasks without overfitting.

02

Small models suffer from over-clustering in representation space.

03

Validated techniques significantly improve small model performance.

Abstract

It is a consensus that small models perform quite poorly under the paradigm of self-supervised contrastive learning. Existing methods usually adopt a large off-the-shelf model to transfer knowledge to the small one via distillation. Despite their effectiveness, distillation-based methods may not be suitable for some resource-restricted scenarios due to the huge computational expenses of deploying a large model. In this paper, we study the issue of training self-supervised small models without distillation signals. We first evaluate the representation spaces of the small models and make two non-negligible observations: (i) the small models can complete the pretext task without overfitting despite their limited capacity and (ii) they universally suffer the problem of over clustering. Then we verify multiple assumptions that are considered to alleviate the over-clustering phenomenon.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning