Slimmable Networks for Contrastive Self-supervised Learning
Shuai Zhao, Linchao Zhu, Xiaohan Wang, Yi Yang

TL;DR
This paper introduces SlimCLR, a one-stage, slimmable network approach for contrastive self-supervised learning that eliminates the need for teacher models, addressing performance issues in small models through novel training techniques.
Contribution
The paper proposes SlimCLR, a novel slimmable network framework for contrastive self-supervised learning, with techniques to improve training stability and performance of small models without extra teachers.
Findings
SlimCLR outperforms previous methods with fewer parameters and FLOPs.
The introduced techniques stabilize training and improve small model performance.
Theoretical analysis shows switchable linear layers are more effective during evaluation.
Abstract
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. Mainstream solutions to this problem rely mainly on knowledge distillation, which involves a two-stage procedure: first training a large teacher model and then distilling it to improve the generalization ability of smaller ones. In this work, we introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers, namely, slimmable networks for contrastive self-supervised learning (SlimCLR). A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks, including small ones with low computation costs. However, interference between weight-sharing networks leads to severe performance degradation in self-supervised cases, as evidenced by gradient magnitude…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Speech and Audio Processing
MethodsLinear Layer · Knowledge Distillation · Contrastive Learning
