Scalable Methods for 8-bit Training of Neural Networks
Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry

TL;DR
This paper introduces a scalable 8-bit training method for neural networks, including a new Range Batch-Normalization, enabling efficient quantization of weights, activations, and gradients with minimal accuracy loss.
Contribution
It presents a novel approach to 8-bit quantization of all training components, including gradients and batch normalization, with theoretical analysis and a new normalization technique.
Findings
Achieves state-of-the-art results on ImageNet-1K with 8-bit training.
Introduces Range Batch-Normalization with high quantization noise tolerance.
Demonstrates robustness of training to significant precision reduction.
Abstract
Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the number of bits required, as well as the best quantization scheme, are yet unknown. Our theoretical analysis suggests that most of the training process is robust to substantial precision reduction, and points to only a few specific operations that require higher precision. Armed with this knowledge, we quantize the model parameters, activations and layer gradients to 8-bit, leaving at a higher precision only the final step in the computation of the weight gradients. Additionally, as QNNs require batch-normalization to be trained at high precision, we introduce Range Batch-Normalization (BN) which has significantly higher tolerance to quantization noise and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
