Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM
Shaokai Ye, Xiaoyu Feng, Tianyun Zhang, Xiaolong Ma, Sheng Lin,, Zhengang Li, Kaidi Xu, Wujie Wen, Sijia Liu, Jian Tang, Makan Fardad, Xue, Lin, Yongpan Liu, Yanzhi Wang

TL;DR
This paper introduces a progressive DNN compression framework using ADMM that significantly improves weight pruning and quantization, enabling ultra-high compression rates with minimal accuracy loss.
Contribution
It extends ADMM-based weight pruning to guarantee feasibility, accelerates convergence, generalizes to quantization, and develops a multi-step progressive approach for superior compression.
Findings
Achieved 246x, 36x, and 8x weight pruning on LeNet-5, AlexNet, and ResNet-50.
Demonstrated 61x weight pruning in AlexNet with minor accuracy loss.
First to derive notable pruning results for ResNet and MobileNet.
Abstract
Weight pruning and weight quantization are two important categories of DNN model compression. Prior work on these techniques are mainly based on heuristics. A recent work developed a systematic frame-work of DNN weight pruning using the advanced optimization technique ADMM (Alternating Direction Methods of Multipliers), achieving one of state-of-art in weight pruning results. In this work, we first extend such one-shot ADMM-based framework to guarantee solution feasibility and provide fast convergence rate, and generalize to weight quantization as well. We have further developed a multi-step, progressive DNN weight pruning and quantization framework, with dual benefits of (i) achieving further weight pruning/quantization thanks to the special property of ADMM regularization, and (ii) reducing the search space within each step. Extensive experimental results demonstrate the superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsPruning · Average Pooling · Local Response Normalization · Grouped Convolution · Dropout · Dense Connections · Softmax · How do I speak to a person at Expedia?-/+/ · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution
