Understanding Straight-Through Estimator in Training Activation   Quantized Neural Nets

Penghang Yin; Jiancheng Lyu; Shuai Zhang; Stanley Osher; Yingyong Qi,; Jack Xin

arXiv:1903.05662·cs.LG·September 26, 2019·62 cites

Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets

Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi,, Jack Xin

PDF

Open Access

TL;DR

This paper provides a theoretical understanding of the straight-through estimator (STE) in training quantized neural networks, showing how proper STE choices lead to effective descent directions and convergence, while poor choices cause instability.

Contribution

It offers a theoretical justification for STE by analyzing its correlation with the true gradient and demonstrating convergence properties in a simplified neural network model.

Findings

01

Proper STE choice ensures positive correlation with the true gradient.

02

Negation of the coarse gradient acts as a descent direction.

03

Poor STE choices cause training instability near local minima.

Abstract

Training activation quantized neural networks involves minimizing a piecewise constant function whose gradient vanishes almost everywhere, which is undesirable for the standard back-propagation or chain rule. An empirical way around this issue is to use a straight-through estimator (STE) (Bengio et al., 2013) in the backward pass only, so that the "gradient" through the modified chain rule becomes non-trivial. Since this unusual "gradient" is certainly not the gradient of loss function, the following question arises: why searching in its negative direction minimizes the training loss? In this paper, we provide the theoretical justification of the concept of STE by answering this question. We consider the problem of learning a two-linear-layer network with binarized ReLU activation and Gaussian input data. We shall refer to the unusual "gradient" given by the STE-modifed chain rule as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM

Methods*Communicated@Fast*How Do I Communicate to Expedia?