Recurrence of Optimum for Training Weight and Activation Quantized Networks
Ziang Long, Penghang Yin, Jack Xin

TL;DR
This paper presents a theoretical analysis showing that training quantized neural networks involves recurrent visits to the global optimum, supported by numerical evidence of weight recurrence during training.
Contribution
It introduces a simple projected gradient-like algorithm for quantizing two-layer networks and proves its weights recurrently reach the global optimum under mild conditions.
Findings
Recurrent visitation of the global optimum by quantized weights.
Numerical evidence of weight recurrence in training deep quantized networks.
Theoretical validation of a projected gradient-like quantization method.
Abstract
Deep neural networks (DNNs) are quantized for efficient inference on resource-constrained platforms. However, training deep learning models with low-precision weights and activations involves a demanding optimization task, which calls for minimizing a stage-wise loss function subject to a discrete set-constraint. While numerous training methods have been proposed, existing studies for full quantization of DNNs are mostly empirical. From a theoretical point of view, we study practical techniques for overcoming the combinatorial nature of network quantization. Specifically, we investigate a simple yet powerful projected gradient-like algorithm for quantizing two-linear-layer networks, which proceeds by repeatedly moving one step at float weights in the negation of a heuristic \emph{fake} gradient of the loss function (so-called coarse gradient) evaluated at quantized weights. For the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
