Establishing a stronger baseline for lightweight contrastive models
Wenye Lin, Yifeng Ding, Zhixiong Cao, Hai-tao Zheng

TL;DR
This paper improves lightweight contrastive learning models by optimizing training settings and introducing a smoothed loss, achieving performance close to larger models without needing a pretrained teacher.
Contribution
It establishes a stronger baseline for lightweight contrastive models by tailoring training recipes and proposing a smoothed InfoNCE loss, eliminating the need for pretrained teacher models.
Findings
Significant accuracy improvements on ImageNet for MobileNet-V3-Large and EfficientNet-B0.
Achieved close-to-resNet50 performance with 5x fewer parameters.
Proposed a smoothed InfoNCE loss to reduce noise in contrastive learning.
Abstract
Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks, such as MobileNet and EfficientNet. A common practice to address this problem is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher. However, it is time and resource consuming to pretrain a teacher model when it is not available. In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model. Specifically, we show that the optimal recipe for efficient models is different from that of larger models, and using the same training settings as ResNet50, as previous research does, is inappropriate. Additionally, we observe a common issu e in contrastive learning where either the positive or negative views…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Video Surveillance and Tracking Methods · Advanced Neural Network Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Pointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Average Pooling · Dense Connections · Batch Normalization · Sigmoid Activation · 1x1 Convolution · Convolution
