Post-training 4-bit quantization of convolution networks for   rapid-deployment

Ron Banner; Yury Nahshan; Elad Hoffer; Daniel Soudry

arXiv:1810.05723·cs.CV·May 30, 2019·127 cites

Post-training 4-bit quantization of convolution networks for rapid-deployment

Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry

PDF

Open Access 2 Repos

TL;DR

This paper presents a practical 4-bit post-training quantization method for convolutional neural networks that reduces memory and computational requirements without full dataset access or fine-tuning.

Contribution

It introduces a novel 4-bit quantization approach that does not require training or full datasets, with three methods to minimize quantization error at the tensor level.

Findings

01

Achieves accuracy within a few percent of state-of-the-art models

02

Does not require fine-tuning or full dataset access

03

Applicable to a wide range of convolutional models

Abstract

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of intermediate results, but it often requires the full datasets and time-consuming fine tuning to recover the accuracy lost after quantization. This paper introduces the first practical 4-bit post training quantization approach: it does not involve training the quantized model (fine-tuning), nor it requires the availability of the full dataset. We target the quantization of both activations and weights and suggest three complementary methods for minimizing quantization error at the tensor level, two of whom obtain a closed-form analytical solution. Combining these methods, our approach achieves accuracy that is just a few percents less the state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques