Transform Quantization for CNN (Convolutional Neural Network)   Compression

Sean I. Young; Wang Zhe; David Taubman; and Bernd Girod

arXiv:2009.01174·cs.CV·November 9, 2021

Transform Quantization for CNN (Convolutional Neural Network) Compression

Sean I. Young, Wang Zhe, David Taubman, and Bernd Girod

PDF

Open Access

TL;DR

This paper introduces a transform quantization method for CNN weight compression that optimally decorrelates and quantizes weights post-training, improving compression efficiency and performance at low bit-rates.

Contribution

It develops a rate-distortion framework and an optimal end-to-end learned transform for CNN weight quantization, unifying decorrelation and quantization in a single approach.

Findings

01

Achieves state-of-the-art CNN compression at 1-2 bits.

02

Effective in both retrained and non-retrained scenarios.

03

Improves compression without significant loss of accuracy.

Abstract

In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Image and Signal Denoising Methods

MethodsDense Connections · Dropout · Global Average Pooling · Concatenated Skip Connection · Kaiming Initialization · Dense Block · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Max Pooling