Training Quantised Neural Networks with STE Variants: the Additive Noise Annealing Algorithm
Matteo Spallanzani, Gian Paolo Leonardi, Luca Benini

TL;DR
This paper analyzes various STE variants for training quantised neural networks, providing a theoretical framework, and introduces the additive noise annealing algorithm that improves training by synchronizing regularisations across layers.
Contribution
It offers a rigorous analysis of STE variants, models them as stochastic regularisations, and proposes the ANA algorithm to enhance QNN training based on these insights.
Findings
ANA improves CIFAR-10 accuracy by proper STE synchronization
Most STE variants can be modelled as stochastic stair function regularisations
Synchronisation of layer regularisations is crucial for convergence
Abstract
Training quantised neural networks (QNNs) is a non-differentiable optimisation problem since weights and features are output by piecewise constant functions. The standard solution is to apply the straight-through estimator (STE), using different functions during the inference and gradient computation steps. Several STE variants have been proposed in the literature aiming to maximise the task accuracy of the trained network. In this paper, we analyse STE variants and study their impact on QNN training. We first observe that most such variants can be modelled as stochastic regularisations of stair functions; although this intuitive interpretation is not new, our rigorous discussion generalises to further variants. Then, we analyse QNNs mixing different regularisations, finding that some suitably synchronised smoothing of each layer map is required to guarantee pointwise compositional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Data Classification
