The Difficulty of Training Sparse Neural Networks
Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

TL;DR
This paper explores the challenges of training sparse neural networks, revealing optimization difficulties, the importance of the dense subspace, and the potential need for extra dimensions to improve training outcomes.
Contribution
It uncovers the optimization landscape of sparse networks, showing the necessity of dense subspace traversal for better solutions and highlighting the complexity of sparse training.
Findings
Linear paths exist from initialization to good solutions despite optimizer failures.
Paths from bad to good solutions in sparse space are not found, but are found when traversing dense space.
Traversing extra dimensions may be necessary to escape local minima in sparse training.
Abstract
We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of \citep{Gale2019, Liu2018} has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the failure of optimizers, there is a linear path with a monotonically decreasing objective from the initialization to the "good" solution. Additionally, our attempts to find a decreasing objective path from "bad" solutions to the "good" ones in the sparse subspace fail. However, if we allow the path to traverse the dense subspace, then we consistently find a path between two solutions. These findings suggest traversing extra dimensions may be needed to escape stationary points found in the sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
