To prune, or not to prune: exploring the efficacy of pruning for model   compression

Michael Zhu; Suyog Gupta

arXiv:1710.01878·stat.ML·November 15, 2017·663 cites

To prune, or not to prune: exploring the efficacy of pruning for model compression

Michael Zhu, Suyog Gupta

PDF

Open Access 4 Repos

TL;DR

This paper compares model pruning and reducing hidden units for neural network compression, finding that pruning large models often yields better accuracy and greater sparsity than simply downsizing dense models, especially in resource-limited settings.

Contribution

It introduces a simple gradual pruning technique applicable across various models and datasets, demonstrating its effectiveness in energy-efficient inference.

Findings

01

Large-sparse models outperform small-dense models at the same memory footprint.

02

Pruning achieves up to 10x reduction in non-zero parameters with minimal accuracy loss.

03

Pruning is effective across CNNs, LSTMs, and seq2seq models.

Abstract

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model's dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Machine Learning and Data Classification

MethodsPruning · Sigmoid Activation · Tanh Activation · Sequence to Sequence · Long Short-Term Memory