Effective Self-supervised Pre-training on Low-compute Networks without   Distillation

Fuwen Tan; Fatemeh Saleh; Brais Martinez

arXiv:2210.02808·cs.CV·October 4, 2023

Effective Self-supervised Pre-training on Low-compute Networks without Distillation

Fuwen Tan, Fatemeh Saleh, Brais Martinez

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new view sampling strategy for self-supervised learning on low-compute networks, significantly improving performance without relying on knowledge distillation.

Contribution

It systematically analyzes the factors limiting SSL on low-capacity networks and proposes a novel view sampling methodology that enhances various SSL methods and tasks.

Findings

01

Improved SSL performance on low-capacity networks without distillation

02

View sampling is crucial for effective SSL on low-compute models

03

Achieved state-of-the-art results across multiple architectures and tasks

Abstract

Despite the impressive progress of self-supervised learning (SSL), its applicability to low-compute networks has received limited attention. Reported performance has trailed behind standard supervised pre-training by a large margin, barring self-supervised learning from making an impact on models that are deployed on device. Most prior works attribute this poor performance to the capacity bottleneck of the low-compute networks and opt to bypass the problem through the use of knowledge distillation (KD). In this work, we revisit SSL for efficient neural networks, taking a closer at what are the detrimental factors causing the practical limitations, and whether they are intrinsic to the self-supervised low-compute setting. We find that, contrary to accepted knowledge, there is no intrinsic architectural bottleneck, we diagnose that the performance bottleneck is related to the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

saic-fi/sslight
pytorchOfficial

Videos

Effective Self-supervised Pre-training on Low-compute Networks without Distillation· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsOPT · Batch Normalization · LARS · Pointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Average Pooling · Convolution · Inverted Residual Block · 1x1 Convolution