Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach
Haichuan Yang, Shupeng Gui, Yuhao Zhu, Ji Liu

TL;DR
This paper introduces an automatic framework for neural network compression that jointly prunes and quantizes DNNs without manual hyper-parameter tuning, achieving significant size reduction without accuracy loss.
Contribution
It presents a constrained optimization-based method for automatic joint pruning and quantization, eliminating the need for manual hyper-parameter setting for each layer.
Findings
ResNet-50 compressed 836× smaller on CIFAR-10 with no accuracy loss.
AlexNet compressed 205× smaller on ImageNet with no accuracy loss.
Abstract
Deep Neural Networks (DNNs) are applied in a wide range of usecases. There is an increased demand for deploying DNNs on devices that do not have abundant resources such as memory and computation units. Recently, network compression through a variety of techniques such as pruning and quantization have been proposed to reduce the resource requirement. A key parameter that all existing compression techniques are sensitive to is the compression ratio (e.g., pruning sparsity, quantization bitwidth) of each layer. Traditional solutions treat the compression ratios of each layer as hyper-parameters, and tune them using human heuristic. Recent researchers start using black-box hyper-parameter optimizations, but they will introduce new hyper-parameters and have efficiency issue. In this paper, we propose a framework to jointly prune and quantize the DNNs automatically according to a target model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Machine Learning and Data Classification
MethodsPruning · 1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax
