In-Hindsight Quantization Range Estimation for Quantized Training
Marios Fournarakis, Markus Nagel

TL;DR
This paper introduces in-hindsight range estimation for quantized training, offering a simple, fast, and hardware-friendly alternative to dynamic quantization that improves gradient and activation quantization during neural network training.
Contribution
It proposes a novel static range estimation method using past iteration data, reducing memory overhead and complexity compared to dynamic quantization in fully quantized training.
Findings
Effective across various architectures including MobileNetV2
Achieves comparable or better accuracy than existing methods
Reduces memory and computational overhead during training
Abstract
Quantization techniques applied to the inference of deep neural networks have enabled fast and efficient execution on resource-constraint devices. The success of quantization during inference has motivated the academic community to explore fully quantized training, i.e. quantizing back-propagation as well. However, effective gradient quantization is still an open problem. Gradients are unbounded and their distribution changes significantly during training, which leads to the need for dynamic quantization. As we show, dynamic quantization can lead to significant memory overhead and additional data traffic slowing down training. We propose a simple alternative to dynamic quantization, in-hindsight range estimation, that uses the quantization ranges estimated on previous iterations to quantize the present. Our approach enables fast static quantization of gradients and activations while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDepthwise Convolution · Pointwise Convolution · Batch Normalization · Depthwise Separable Convolution · Inverted Residual Block · Convolution · 1x1 Convolution · Average Pooling
