Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski

TL;DR
This paper dissects the Lottery Ticket Hypothesis, revealing that zeros, signs, and masks are key to understanding sparse network training, and introduces Supermasks that significantly improve untrained network performance.
Contribution
It provides new insights into the components of the Lottery Ticket algorithm and introduces Supermasks, masks that enable strong performance on untrained networks.
Findings
Zeros are crucial for sparse network performance.
Signs alone suffice for effective reinitialization.
Supermasks achieve high accuracy on MNIST and CIFAR-10 without training.
Abstract
The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed that a simple approach to creating sparse networks (keeping the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights. The performance of these networks often exceeds the performance of the non-sparse base model, but for reasons that were not well understood. In this paper we study the three critical components of the Lottery Ticket (LT) algorithm, showing that each may be varied significantly without impacting the overall results. Ablating these factors leads to new insights for why LT networks perform as well as they do. We show why setting weights to zero is important, how signs are all you need to make the reinitialized network train, and why masking behaves like training. Finally, we discover the existence of Supermasks, masks that can be applied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained)· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
