Stabilizing the Lottery Ticket Hypothesis

Jonathan Frankle; Gintare Karolina Dziugaite; Daniel M. Roy; Michael; Carbin

arXiv:1903.01611·cs.LG·September 29, 2020·146 cites

Stabilizing the Lottery Ticket Hypothesis

Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael, Carbin

PDF

Open Access 3 Repos

TL;DR

This paper modifies the iterative magnitude pruning method to identify trainable subnetworks early in training, enabling pruning of deep networks like ResNet-50 on ImageNet and providing new insights into the lottery ticket hypothesis.

Contribution

It introduces a modified IMP approach that searches for subnetworks obtainable early in training, improving pruning success on deep networks and challenging previous limitations.

Findings

01

Early pruning leads to subnetworks with comparable accuracy to full networks.

02

Subnetworks trained from early pruning are closer to full network parameters.

03

Improved stability and consistency in subnetworks correlate with better training outcomes.

Abstract

Pruning is a well-established technique for removing unnecessary structure from neural networks after training to improve the performance of inference. Several recent results have explored the possibility of pruning at initialization time to provide similar benefits during training. In particular, the "lottery ticket hypothesis" conjectures that typical neural networks contain small subnetworks that can train to similar accuracy in a commensurate number of steps. The evidence for this claim is that a procedure based on iterative magnitude pruning (IMP) reliably finds such subnetworks retroactively on small vision tasks. However, IMP fails on deeper networks, and proposed methods to prune before training or train pruned networks encounter similar scaling limitations. In this paper, we argue that these efforts have struggled on deeper networks because they have focused on pruning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsPruning