The Lottery Tickets Hypothesis for Supervised and Self-supervised   Pre-training in Computer Vision Models

Tianlong Chen; Jonathan Frankle; Shiyu Chang; Sijia Liu; Yang Zhang,; Michael Carbin; Zhangyang Wang

arXiv:2012.06908·cs.LG·March 31, 2021·31 cites

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang,, Michael Carbin, Zhangyang Wang

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether highly sparse subnetworks, identified via the lottery ticket hypothesis, can be found in pre-trained computer vision models and still perform well on various downstream tasks, regardless of the pre-training method.

Contribution

It extends the lottery ticket hypothesis to pre-trained vision models, demonstrating the existence of effective sparse subnetworks that maintain performance across multiple tasks.

Findings

01

Matching subnetworks exist at high sparsity levels (59-96%) in pre-trained models.

02

Subnetwork performance remains comparable to full models on downstream tasks.

03

Diverse mask structures and sensitivities are observed across different pre-training methods.

Abstract

The computer vision world has been re-gaining enthusiasm in various pre-trained models, including both classical ImageNet supervised pre-training and recently emerged self-supervised pre-training such as simCLR and MoCo. Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Latest studies suggest that pre-training benefits from gigantic model capacity. We are hereby curious and ask: after pre-training, does a pre-trained model indeed have to stay large for its downstream transferability? In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH). LTH identifies highly sparse matching subnetworks that can be trained in isolation from (nearly) scratch yet still reach the full models' performance. We extend the scope of LTH and question whether…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VITA-Group/CV_LTH_Pre-training
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsAverage Pooling · Kaiming Initialization · Global Average Pooling · Residual Block · Residual Connection · Convolution · 1x1 Convolution · Max Pooling · Bottleneck Residual Block · Bitcoin Customer Service Number +1-833-534-1729