One Network Doesn't Rule Them All: Moving Beyond Handcrafted Architectures in Self-Supervised Learning
Sharath Girish, Debadeepta Dey, Neel Joshi, Vibhav Vineet, Shital, Shah, Caio Cesar Teodoro Mendes, Abhinav Shrivastava, Yale Song

TL;DR
This paper demonstrates that neural network architecture significantly impacts self-supervised learning performance and proposes learning architectures alongside weights to improve results across diverse scenarios.
Contribution
It provides extensive empirical evidence on architecture influence in SSL and introduces a method to learn architectures during SSL training, outperforming traditional handcrafted models.
Findings
Architecture choice greatly affects SSL outcomes.
Self-supervised architecture search yields better performance.
Proposed architectures outperform ResNet18 and MobileNetV2.
Abstract
The current literature on self-supervised learning (SSL) focuses on developing learning objectives to train neural networks more effectively on unlabeled data. The typical development process involves taking well-established architectures, e.g., ResNet demonstrated on ImageNet, and using them to evaluate newly developed objectives on downstream scenarios. While convenient, this does not take into account the role of architectures which has been shown to be crucial in the supervised learning literature. In this work, we establish extensive empirical evidence showing that a network architecture plays a significant role in SSL. We conduct a large-scale study with over 100 variants of ResNet and MobileNet architectures and evaluate them across 11 downstream scenarios in the SSL setting. We show that there is no one network that performs consistently well across the scenarios. Based on this,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Advanced Graph Neural Networks
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · 1x1 Convolution · Average Pooling · Bottleneck Residual Block · Kaiming Initialization · Max Pooling · Residual Block · Global Average Pooling
