Stabilizing the Lottery Ticket Hypothesis
Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael, Carbin

TL;DR
This paper modifies the iterative magnitude pruning method to identify trainable subnetworks early in training, enabling pruning of deep networks like ResNet-50 on ImageNet and providing new insights into the lottery ticket hypothesis.
Contribution
It introduces a modified IMP approach that searches for subnetworks obtainable early in training, improving pruning success on deep networks and challenging previous limitations.
Findings
Early pruning leads to subnetworks with comparable accuracy to full networks.
Subnetworks trained from early pruning are closer to full network parameters.
Improved stability and consistency in subnetworks correlate with better training outcomes.
Abstract
Pruning is a well-established technique for removing unnecessary structure from neural networks after training to improve the performance of inference. Several recent results have explored the possibility of pruning at initialization time to provide similar benefits during training. In particular, the "lottery ticket hypothesis" conjectures that typical neural networks contain small subnetworks that can train to similar accuracy in a commensurate number of steps. The evidence for this claim is that a procedure based on iterative magnitude pruning (IMP) reliably finds such subnetworks retroactively on small vision tasks. However, IMP fails on deeper networks, and proposed methods to prune before training or train pruned networks encounter similar scaling limitations. In this paper, we argue that these efforts have struggled on deeper networks because they have focused on pruning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsPruning
