One Network Doesn't Rule Them All: Moving Beyond Handcrafted   Architectures in Self-Supervised Learning

Sharath Girish; Debadeepta Dey; Neel Joshi; Vibhav Vineet; Shital; Shah; Caio Cesar Teodoro Mendes; Abhinav Shrivastava; Yale Song

arXiv:2203.08130·cs.CV·March 16, 2022

One Network Doesn't Rule Them All: Moving Beyond Handcrafted Architectures in Self-Supervised Learning

Sharath Girish, Debadeepta Dey, Neel Joshi, Vibhav Vineet, Shital, Shah, Caio Cesar Teodoro Mendes, Abhinav Shrivastava, Yale Song

PDF

Open Access

TL;DR

This paper demonstrates that neural network architecture significantly impacts self-supervised learning performance and proposes learning architectures alongside weights to improve results across diverse scenarios.

Contribution

It provides extensive empirical evidence on architecture influence in SSL and introduces a method to learn architectures during SSL training, outperforming traditional handcrafted models.

Findings

01

Architecture choice greatly affects SSL outcomes.

02

Self-supervised architecture search yields better performance.

03

Proposed architectures outperform ResNet18 and MobileNetV2.

Abstract

The current literature on self-supervised learning (SSL) focuses on developing learning objectives to train neural networks more effectively on unlabeled data. The typical development process involves taking well-established architectures, e.g., ResNet demonstrated on ImageNet, and using them to evaluate newly developed objectives on downstream scenarios. While convenient, this does not take into account the role of architectures which has been shown to be crucial in the supervised learning literature. In this work, we establish extensive empirical evidence showing that a network architecture plays a significant role in SSL. We conduct a large-scale study with over 100 variants of ResNet and MobileNet architectures and evaluate them across 11 downstream scenarios in the SSL setting. We show that there is no one network that performs consistently well across the scenarios. Based on this,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Advanced Graph Neural Networks

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · 1x1 Convolution · Average Pooling · Bottleneck Residual Block · Kaiming Initialization · Max Pooling · Residual Block · Global Average Pooling