To prune, or not to prune: exploring the efficacy of pruning for model compression
Michael Zhu, Suyog Gupta

TL;DR
This paper compares model pruning and reducing hidden units for neural network compression, finding that pruning large models often yields better accuracy and greater sparsity than simply downsizing dense models, especially in resource-limited settings.
Contribution
It introduces a simple gradual pruning technique applicable across various models and datasets, demonstrating its effectiveness in energy-efficient inference.
Findings
Large-sparse models outperform small-dense models at the same memory footprint.
Pruning achieves up to 10x reduction in non-zero parameters with minimal accuracy loss.
Pruning is effective across CNNs, LSTMs, and seq2seq models.
Abstract
Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model's dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Machine Learning and Data Classification
MethodsPruning · Sigmoid Activation · Tanh Activation · Sequence to Sequence · Long Short-Term Memory
