Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear Classification
Ziang Long, Penghang Yin, Jack Xin

TL;DR
This paper introduces a theoretical framework for training quantized neural networks using a novel coarse gradient method with straight-through estimators, proving convergence to optimal solutions and validating results on synthetic and MNIST data.
Contribution
It provides the first theoretical analysis of straight-through estimators in quantized neural network training, establishing convergence guarantees for a class of STEs.
Findings
Coarse gradient methods converge to the global minimum.
Proposed STEs effectively train quantized networks.
Experimental results verify theoretical guarantees.
Abstract
Quantized or low-bit neural networks are attractive due to their inference efficiency. However, training deep neural networks with quantized activations involves minimizing a discontinuous and piecewise constant loss function. Such a loss function has zero gradients almost everywhere (a.e.), which makes the conventional gradient-based algorithms inapplicable. To this end, we study a novel class of \emph{biased} first-order oracle, termed coarse gradient, for overcoming the vanished gradient issue. A coarse gradient is generated by replacing the a.e. zero derivatives of quantized (i.e., stair-case) ReLU activation composited in the chain rule with some heuristic proxy derivative called straight-through estimator (STE). Although having been widely used in training quantized networks empirically, fundamental questions like when and why the ad-hoc STE trick works, still lacks theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM
Methods*Communicated@Fast*How Do I Communicate to Expedia?
