Improving Post Training Neural Quantization: Layer-wise Calibration and   Integer Programming

Itay Hubara; Yury Nahshan; Yair Hanani; Ron Banner; Daniel Soudry

arXiv:2006.10518·cs.LG·December 15, 2020·76 cites

Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a novel layer-wise calibration and integer programming approach to improve post-training neural quantization, enabling effective 4-bit quantization with minimal accuracy loss on vision and text models.

Contribution

It proposes a layer-wise optimization and integer programming method for better quantization, surpassing previous dynamic range setting techniques, and demonstrates state-of-the-art results.

Findings

01

Achieves less than 1% accuracy degradation with 4-bit quantization on ResNet50.

02

Less susceptible to over-fitting, effective on small calibration sets.

03

State-of-the-art results on vision and text models.

Abstract

Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant over-fitting. Instead, these methods only use the calibration set to set the activations' dynamic ranges. However, such methods always resulted in significant accuracy degradation, when used below 8-bits (except on small datasets). Here we aim to break the 8-bit barrier. To this end, we minimize the quantization errors of each layer separately by optimizing its parameters over the calibration set. We empirically demonstrate that this approach is: (1) much less susceptible to over-fitting than the standard fine-tuning approaches, and can be used even on a very small calibration set; and (2) more powerful than previous methods, which only set the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

itayhubara/CalibTIP
pytorchOfficial

Models

🤗
compressa-ai/Saiga-Llama-3-8B-AdaQRound
model· 4 dl
4 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications