Towards the Limit of Network Quantization
Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee

TL;DR
This paper develops a theoretically grounded approach to neural network quantization that minimizes performance loss under compression constraints, using Hessian-weighted clustering and entropy coding, achieving high compression ratios.
Contribution
It introduces a Hessian-weighted quantization scheme and links network quantization to entropy-constrained scalar quantization, providing new methods for efficient neural network compression.
Findings
Achieved compression ratios of 51.25, 22.17, and 40.65 for LeNet, ResNet, and AlexNet.
Proposed Hessian-weighted k-means clustering for optimal parameter quantization.
Linked network quantization to entropy-constrained scalar quantization in information theory.
Abstract
Network quantization is one of network compression techniques to reduce the redundancy of deep neural networks. It reduces the number of distinct network parameter values by quantization in order to save the storage for them. In this paper, we design network quantization schemes that minimize the performance loss due to quantization given a compression ratio constraint. We analyze the quantitative relation of quantization errors to the neural network loss function and identify that the Hessian-weighted distortion measure is locally the right objective function for the optimization of network quantization. As a result, Hessian-weighted k-means clustering is proposed for clustering network parameters to quantize. When optimal variable-length binary codes, e.g., Huffman codes, are employed for further compression, we derive that the network quantization problem can be related to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsAverage Pooling · Local Response Normalization · Grouped Convolution · Dropout · Dense Connections · LeNet · Softmax · How do I speak to a person at Expedia?-/+/ · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution
