Spending Your Winning Lottery Better After Drawing It
Ajay Kumar Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang

TL;DR
This paper improves the training of sparse subnetworks derived from dense neural networks by introducing purposeful modifications, leading to state-of-the-art results and better generalization across datasets and architectures.
Contribution
It demonstrates that tailored tweaks to sparse network architecture and training recipes can significantly enhance performance beyond traditional inheritance methods.
Findings
Achieved 1.05% - 4.93% performance gains on ResNet18/CIFAR-100.
Proposed tweaks outperform vanilla-LTH at high sparsity levels.
Methods generalize across datasets and architectures.
Abstract
Lottery Ticket Hypothesis (LTH) suggests that a dense neural network contains a sparse sub-network that can match the performance of the original dense network when trained in isolation from scratch. Most works retrain the sparse sub-network with the same training protocols as its dense network, such as initialization, architecture blocks, and training recipes. However, till now it is unclear that whether these training protocols are optimal for sparse networks. In this paper, we demonstrate that it is unnecessary for spare retraining to strictly inherit those properties from the dense network. Instead, by plugging in purposeful "tweaks" of the sparse subnetwork architecture or its training recipe, its retraining can be significantly improved than the default, especially at high sparsity levels. Combining all our proposed "tweaks" can yield the new state-of-the-art performance of LTH,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
