The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang,, Michael Carbin, Zhangyang Wang

TL;DR
This paper investigates whether highly sparse subnetworks, identified via the lottery ticket hypothesis, can be found in pre-trained computer vision models and still perform well on various downstream tasks, regardless of the pre-training method.
Contribution
It extends the lottery ticket hypothesis to pre-trained vision models, demonstrating the existence of effective sparse subnetworks that maintain performance across multiple tasks.
Findings
Matching subnetworks exist at high sparsity levels (59-96%) in pre-trained models.
Subnetwork performance remains comparable to full models on downstream tasks.
Diverse mask structures and sensitivities are observed across different pre-training methods.
Abstract
The computer vision world has been re-gaining enthusiasm in various pre-trained models, including both classical ImageNet supervised pre-training and recently emerged self-supervised pre-training such as simCLR and MoCo. Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Latest studies suggest that pre-training benefits from gigantic model capacity. We are hereby curious and ask: after pre-training, does a pre-trained model indeed have to stay large for its downstream transferability? In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH). LTH identifies highly sparse matching subnetworks that can be trained in isolation from (nearly) scratch yet still reach the full models' performance. We extend the scope of LTH and question whether…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsAverage Pooling · Kaiming Initialization · Global Average Pooling · Residual Block · Residual Connection · Convolution · 1x1 Convolution · Max Pooling · Bottleneck Residual Block · Bitcoin Customer Service Number +1-833-534-1729
