Effective Self-supervised Pre-training on Low-compute Networks without Distillation
Fuwen Tan, Fatemeh Saleh, Brais Martinez

TL;DR
This paper introduces a new view sampling strategy for self-supervised learning on low-compute networks, significantly improving performance without relying on knowledge distillation.
Contribution
It systematically analyzes the factors limiting SSL on low-capacity networks and proposes a novel view sampling methodology that enhances various SSL methods and tasks.
Findings
Improved SSL performance on low-capacity networks without distillation
View sampling is crucial for effective SSL on low-compute models
Achieved state-of-the-art results across multiple architectures and tasks
Abstract
Despite the impressive progress of self-supervised learning (SSL), its applicability to low-compute networks has received limited attention. Reported performance has trailed behind standard supervised pre-training by a large margin, barring self-supervised learning from making an impact on models that are deployed on device. Most prior works attribute this poor performance to the capacity bottleneck of the low-compute networks and opt to bypass the problem through the use of knowledge distillation (KD). In this work, we revisit SSL for efficient neural networks, taking a closer at what are the detrimental factors causing the practical limitations, and whether they are intrinsic to the self-supervised low-compute setting. We find that, contrary to accepted knowledge, there is no intrinsic architectural bottleneck, we diagnose that the performance bottleneck is related to the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsOPT · Batch Normalization · LARS · Pointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Average Pooling · Convolution · Inverted Residual Block · 1x1 Convolution
