Rethinking the Value of Network Pruning

Zhuang Liu; Mingjie Sun; Tinghui Zhou; Gao Huang; Trevor Darrell

arXiv:1810.05270·cs.LG·March 6, 2019·742 cites

Rethinking the Value of Network Pruning

Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell

PDF

Open Access 2 Repos

TL;DR

This paper challenges the effectiveness of traditional network pruning by showing that training a smaller network from scratch can match or outperform pruned models, questioning the importance of learned weights and the pruning process.

Contribution

It reveals that fine-tuning pruned models often does not outperform training small models from scratch, and highlights the importance of architecture over learned weights in pruning.

Findings

01

Fine-tuning pruned models often yields similar or worse performance than training from scratch.

02

Pruned architecture alone is more crucial than inherited important weights.

03

The Lottery Ticket Hypothesis does not necessarily improve performance with optimal learning rates.

Abstract

Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsPruning