Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar,, Alexandra Peste, Dan Alistarh

TL;DR
This paper investigates the challenges of training highly sparse neural networks, revealing that standard dense training methods are suboptimal for sparsity and proposing new approaches to improve accuracy in high-sparsity regimes.
Contribution
It demonstrates the limitations of standard training recipes for sparse networks and introduces new methods that achieve state-of-the-art results in high-sparsity scenarios for vision and language models.
Findings
Standard dense training recipes are suboptimal for sparse networks.
New approaches improve accuracy in high-sparsity regimes.
Achieved state-of-the-art results in sparse vision and language models.
Abstract
Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and most existing work uses standard dense schedules and hyperparameters for training sparse networks. In this work, we examine the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We begin by showing that using standard dense training recipes for sparse training is suboptimal, and results in under-training. We provide new approaches for mitigating this issue for both sparse pre-training of vision models (e.g.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsPruning
