Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael, Carbin

TL;DR
This paper investigates the stability of neural network minima under SGD noise and its relation to the lottery ticket hypothesis, revealing that stable subnetworks emerge early and are crucial for achieving full accuracy.
Contribution
It introduces a method to analyze linear connectivity of minima and links stability to the success of subnetworks identified by iterative magnitude pruning.
Findings
Neural networks become stable to SGD noise early in training.
Stable subnetworks are essential for reaching full accuracy.
Stability occurs at initialization for small models and early training for large models.
Abstract
We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy. We find that these subnetworks only reach full accuracy when they are stable to SGD noise, which either occurs at initialization for small-scale settings (MNIST) or early in training for large-scale settings (ResNet-50 and Inception-v3 on ImageNet).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning
MethodsPruning · Average Pooling · Auxiliary Classifier · 1x1 Convolution · RMSProp · Inception-v3 Module · Max Pooling · Softmax · Convolution · Dropout
