Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear   Classification

Ziang Long; Penghang Yin; Jack Xin

arXiv:2011.11256·cs.LG·June 15, 2021

Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear Classification

Ziang Long, Penghang Yin, Jack Xin

PDF

Open Access

TL;DR

This paper introduces a theoretical framework for training quantized neural networks using a novel coarse gradient method with straight-through estimators, proving convergence to optimal solutions and validating results on synthetic and MNIST data.

Contribution

It provides the first theoretical analysis of straight-through estimators in quantized neural network training, establishing convergence guarantees for a class of STEs.

Findings

01

Coarse gradient methods converge to the global minimum.

02

Proposed STEs effectively train quantized networks.

03

Experimental results verify theoretical guarantees.

Abstract

Quantized or low-bit neural networks are attractive due to their inference efficiency. However, training deep neural networks with quantized activations involves minimizing a discontinuous and piecewise constant loss function. Such a loss function has zero gradients almost everywhere (a.e.), which makes the conventional gradient-based algorithms inapplicable. To this end, we study a novel class of \emph{biased} first-order oracle, termed coarse gradient, for overcoming the vanished gradient issue. A coarse gradient is generated by replacing the a.e. zero derivatives of quantized (i.e., stair-case) ReLU activation composited in the chain rule with some heuristic proxy derivative called straight-through estimator (STE). Although having been widely used in training quantized networks empirically, fundamental questions like when and why the ad-hoc STE trick works, still lacks theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM

Methods*Communicated@Fast*How Do I Communicate to Expedia?