Learning low-precision neural networks without Straight-Through   Estimator(STE)

Zhi-Gang Liu; Matthew Mattina

arXiv:1903.01061·cs.LG·May 22, 2019·5 cites

Learning low-precision neural networks without Straight-Through Estimator(STE)

Zhi-Gang Liu, Matthew Mattina

PDF

Open Access

TL;DR

This paper introduces alpha-blending, a gradient-based method for training low-precision neural networks that avoids the theoretical issues of STE by gradually transitioning from full-precision to quantized weights during training.

Contribution

The paper proposes alpha-blending, a novel approach that replaces STE with a stochastic gradient descent method for low-precision neural network training.

Findings

01

AB improves top-1 accuracy by up to 2.93% on ImageNet.

02

AB outperforms STE-based quantization on CIFAR10 and ImageNet.

03

The method enables progressive transition from full-precision to low-precision models.

Abstract

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low-precision using stochastic gradient descent (SGD). Our method (AB) avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient $α$ and $1 - α$ . During training, $α$ is gradually increased from 0 to 1; the gradient updates to the weights are through the full-precision term, $(1 - α) w$ , of the affine combination; the model is converted from full-precision to low-precision progressively. To evaluate the method, a 1-bit BinaryNet on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI