Spending Your Winning Lottery Better After Drawing It

Ajay Kumar Jaiswal; Haoyu Ma; Tianlong Chen; Ying Ding; Zhangyang Wang

arXiv:2101.03255·cs.LG·October 12, 2021·1 cites

Spending Your Winning Lottery Better After Drawing It

Ajay Kumar Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang

PDF

Open Access 1 Repo

TL;DR

This paper improves the training of sparse subnetworks derived from dense neural networks by introducing purposeful modifications, leading to state-of-the-art results and better generalization across datasets and architectures.

Contribution

It demonstrates that tailored tweaks to sparse network architecture and training recipes can significantly enhance performance beyond traditional inheritance methods.

Findings

01

Achieved 1.05% - 4.93% performance gains on ResNet18/CIFAR-100.

02

Proposed tweaks outperform vanilla-LTH at high sparsity levels.

03

Methods generalize across datasets and architectures.

Abstract

Lottery Ticket Hypothesis (LTH) suggests that a dense neural network contains a sparse sub-network that can match the performance of the original dense network when trained in isolation from scratch. Most works retrain the sparse sub-network with the same training protocols as its dense network, such as initialization, architecture blocks, and training recipes. However, till now it is unclear that whether these training protocols are optimal for sparse networks. In this paper, we demonstrate that it is unnecessary for spare retraining to strictly inherit those properties from the dense network. Instead, by plugging in purposeful "tweaks" of the sparse subnetwork architecture or its training recipe, its retraining can be significantly improved than the default, especially at high sparsity levels. Combining all our proposed "tweaks" can yield the new state-of-the-art performance of LTH,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VITA-Group/KD-ticket
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning