Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
Xiaohan Ding, Guiguang Ding, Xiangxin Zhou, Yuchen Guo, Jungong Han,, Ji Liu

TL;DR
This paper introduces a global sparse momentum SGD method for pruning deep neural networks, enabling automatic, end-to-end model compression without extensive manual tuning or post-pruning retraining.
Contribution
The proposed method achieves automatic layer-wise sparsity, simplifies pruning process, and improves the discovery of effective subnetworks compared to prior techniques.
Findings
Automatic global sparsity ratios for all layers.
No need for post-pruning retraining.
Better identification of winning subnetworks.
Abstract
Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices. DNN pruning is an approach for deep model compression, which aims at eliminating some parameters with tolerable performance degradation. In this paper, we propose a novel momentum-SGD-based optimization method to reduce the network complexity by on-the-fly pruning. Concretely, given a global compression ratio, we categorize all the parameters into two parts at each training iteration which are updated using different rules. In this way, we gradually zero out the redundant parameters, as we update them using only the ordinary weight decay but no gradients derived from the objective function. As a departure from prior methods that require heavy human works to tune the layer-wise sparsity ratios, prune by solving complicated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Neural Networks and Applications
MethodsPruning · Weight Decay
