Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win
Jaron Maene, Mingxiao Li, Marie-Francine Moens

TL;DR
This paper investigates the lottery ticket hypothesis, demonstrating that large networks can be rewound to initialization using stable training methods, and suggests that lottery tickets are essentially retrainings of the same regions in the loss landscape.
Contribution
It shows that large networks can be rewound to initialization with stable training, challenging previous beliefs about the uniqueness of lottery tickets.
Findings
Large networks can be rewound to initialization using stable training.
Lottery tickets retrain to the same regions in the loss landscape.
Existing lottery tickets depend on dense training and iterative magnitude pruning.
Abstract
The lottery ticket hypothesis states that sparse subnetworks exist in randomly initialized dense networks that can be trained to the same accuracy as the dense network they reside in. However, the subsequent work has failed to replicate this on large-scale models and required rewinding to an early stable state instead of initialization. We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization. Our subsequent experiments on common vision tasks give strong credence to the hypothesis in Evci et al. (2020b) that lottery tickets simply retrain to the same regions (although not necessarily to the same basin). These results imply that existing lottery tickets could not have been found without the preceding dense training by iterative magnitude pruning, raising doubts about the use of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Sports Analytics and Performance
