Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Utku Evci; Yani A. Ioannou; Cem Keskin; Yann Dauphin

arXiv:2010.03533·cs.LG·March 17, 2022

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Utku Evci, Yani A. Ioannou, Cem Keskin, Yann Dauphin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates why training unstructured sparse neural networks from random initialization performs poorly, highlighting the importance of sparsity-aware initialization and analyzing the success of Lottery Tickets and Dynamic Sparse Training.

Contribution

The paper provides a detailed analysis of gradient flow in sparse neural networks, revealing the role of initialization and training methods in their performance and success.

Findings

01

Sparse NNs have poor gradient flow at initialization.

02

DST improves gradient flow over traditional sparse training.

03

LTs succeed by re-learning pruning solutions, not improving gradient flow.

Abstract

Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). Through our analysis of gradient flow during training we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and demonstrate the importance of using sparsity-aware initialization. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/rigl
tfOfficial

Videos

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques

MethodsDynamic Sparse Training · Pruning