Learning low-precision neural networks without Straight-Through Estimator(STE)
Zhi-Gang Liu, Matthew Mattina

TL;DR
This paper introduces alpha-blending, a gradient-based method for training low-precision neural networks that avoids the theoretical issues of STE by gradually transitioning from full-precision to quantized weights during training.
Contribution
The paper proposes alpha-blending, a novel approach that replaces STE with a stochastic gradient descent method for low-precision neural network training.
Findings
AB improves top-1 accuracy by up to 2.93% on ImageNet.
AB outperforms STE-based quantization on CIFAR10 and ImageNet.
The method enables progressive transition from full-precision to low-precision models.
Abstract
The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low-precision using stochastic gradient descent (SGD). Our method (AB) avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient and . During training, is gradually increased from 0 to 1; the gradient updates to the weights are through the full-precision term, , of the affine combination; the model is converted from full-precision to low-precision progressively. To evaluate the method, a 1-bit BinaryNet on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
