Transform Quantization for CNN (Convolutional Neural Network) Compression
Sean I. Young, Wang Zhe, David Taubman, and Bernd Girod

TL;DR
This paper introduces a transform quantization method for CNN weight compression that optimally decorrelates and quantizes weights post-training, improving compression efficiency and performance at low bit-rates.
Contribution
It develops a rate-distortion framework and an optimal end-to-end learned transform for CNN weight quantization, unifying decorrelation and quantization in a single approach.
Findings
Achieves state-of-the-art CNN compression at 1-2 bits.
Effective in both retrained and non-retrained scenarios.
Improves compression without significant loss of accuracy.
Abstract
In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Image and Signal Denoising Methods
MethodsDense Connections · Dropout · Global Average Pooling · Concatenated Skip Connection · Kaiming Initialization · Dense Block · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Max Pooling
