Value-aware Quantization for Training and Inference of Neural Networks
Eunhyeok Park, Sungjoo Yoo, Peter Vajda

TL;DR
This paper introduces a value-aware quantization method that applies aggressive low-precision quantization to neural network data, maintaining accuracy while significantly reducing memory usage during training and inference.
Contribution
It presents a novel quantization technique that separately handles large data in high precision, enabling low-precision training and inference with minimal accuracy loss.
Findings
3-bit activations achieve full-precision accuracy with 2% large data handling
Memory reduction of 41.6% and 53.7% in ResNet-152 and Inception-v3
Quantization to 4-bit weights and activations with minimal accuracy drop
Abstract
We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large data in high precision, which reduces total quantization errors under very low precision. We present new techniques to apply the proposed quantization to training and inference. The experiments show that our method with 3-bit activations (with 2% of large ones) can give the same training accuracy as full-precision one while offering significant (41.6% and 53.7%) reductions in the memory cost of activations in ResNet-152 and Inception-v3 compared with the state-of-the-art method. Our experiments also show that deep networks such as Inception-v3, ResNet-101 and DenseNet-121 can be quantized for inference with 4-bit weights and activations (with 1% 16-bit data) within 1% top-1 accuracy drop.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRMSProp · Convolution · Average Pooling · Auxiliary Classifier · 1x1 Convolution · Inception-v3 Module · Max Pooling · Softmax · Dropout · Dense Connections
