Automatic Neural Network Compression by Sparsity-Quantization Joint   Learning: A Constrained Optimization-based Approach

Haichuan Yang; Shupeng Gui; Yuhao Zhu; Ji Liu

arXiv:1910.05897·cs.LG·May 19, 2020·5 cites

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach

Haichuan Yang, Shupeng Gui, Yuhao Zhu, Ji Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces an automatic framework for neural network compression that jointly prunes and quantizes DNNs without manual hyper-parameter tuning, achieving significant size reduction without accuracy loss.

Contribution

It presents a constrained optimization-based method for automatic joint pruning and quantization, eliminating the need for manual hyper-parameter setting for each layer.

Findings

01

ResNet-50 compressed 836× smaller on CIFAR-10 with no accuracy loss.

02

AlexNet compressed 205× smaller on ImageNet with no accuracy loss.

Abstract

Deep Neural Networks (DNNs) are applied in a wide range of usecases. There is an increased demand for deploying DNNs on devices that do not have abundant resources such as memory and computation units. Recently, network compression through a variety of techniques such as pruning and quantization have been proposed to reduce the resource requirement. A key parameter that all existing compression techniques are sensitive to is the compression ratio (e.g., pruning sparsity, quantization bitwidth) of each layer. Traditional solutions treat the compression ratios of each layer as hyper-parameters, and tune them using human heuristic. Recent researchers start using black-box hyper-parameter optimizations, but they will introduce new hyper-parameters and have efficiency issue. In this paper, we propose a framework to jointly prune and quantize the DNNs automatically according to a target model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hyang1990/sparsity_quantization_joint
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Machine Learning and Data Classification

MethodsPruning · 1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax