Model compression as constrained optimization, with application to neural nets. Part II: quantization
Miguel \'A. Carreira-Perpi\~n\'an, Yerlan Idelbayev

TL;DR
This paper introduces a new iterative method for neural network weight quantization that guarantees convergence to a local optimum, enabling higher compression rates with minimal loss.
Contribution
It proposes a model compression framework as constrained optimization, with an iterative learning-compression algorithm that ensures convergence and supports various quantization schemes.
Findings
Achieves higher compression rates than previous methods.
Maintains negligible loss degradation at 1-bit quantization.
Supports adaptive and fixed codebook schemes.
Abstract
We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with entries so that the training loss of the quantized net is minimal. The codebook can be optimally learned jointly with the net, or fixed, as for binarization or ternarization approaches. Previous work has quantized the weights of the reference net, or incorporated rounding operations in the backpropagation algorithm, but this has no guarantee of converging to a loss-optimal, quantized net. We describe a new approach based on the recently proposed framework of model compression as constrained optimization \citep{Carreir17a}. This results in a simple iterative "learning-compression" algorithm, which alternates a step that learns a net of continuous weights with a step that quantizes (or binarizes/ternarizes) the weights,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Data Compression Techniques · Generative Adversarial Networks and Image Synthesis
