Training with Quantization Noise for Extreme Model Compression

Angela Fan; Pierre Stock; Benjamin Graham; Edouard Grave; Remi; Gribonval; Herve Jegou; Armand Joulin

arXiv:2004.07320·cs.LG·March 2, 2021·115 cites

Training with Quantization Noise for Extreme Model Compression

Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi, Gribonval, Herve Jegou, Armand Joulin

PDF

Open Access 4 Repos 1 Video

TL;DR

This paper introduces a novel quantization noise training method that enables extreme model compression while preserving accuracy, surpassing previous techniques especially in NLP and image classification tasks.

Contribution

The authors propose a new approach to quantization-aware training that allows for unbiased gradients during extreme compression, extending beyond traditional int8 quantization.

Findings

01

Achieved 82.5% accuracy on MNLI with a 14MB RoBERTa model.

02

Reached 80.0% top-1 accuracy on ImageNet with a 3.3MB EfficientNet-B3.

03

Established new state-of-the-art accuracy-compression trade-offs.

Abstract

We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods where the approximations introduced by STE are severe, such as Product Quantization. Our proposal is to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights. Controlling the amount of noise and its form allows for extreme compression rates while maintaining the performance of the original model. As a result we establish new state-of-the-art compromises between accuracy and model size both in natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Training with Quantization Noise for Extreme Model Compression· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · WordPiece · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · BERT