Recurrence of Optimum for Training Weight and Activation Quantized   Networks

Ziang Long; Penghang Yin; Jack Xin

arXiv:2012.05529·cs.LG·May 25, 2021

Recurrence of Optimum for Training Weight and Activation Quantized Networks

Ziang Long, Penghang Yin, Jack Xin

PDF

Open Access

TL;DR

This paper presents a theoretical analysis showing that training quantized neural networks involves recurrent visits to the global optimum, supported by numerical evidence of weight recurrence during training.

Contribution

It introduces a simple projected gradient-like algorithm for quantizing two-layer networks and proves its weights recurrently reach the global optimum under mild conditions.

Findings

01

Recurrent visitation of the global optimum by quantized weights.

02

Numerical evidence of weight recurrence in training deep quantized networks.

03

Theoretical validation of a projected gradient-like quantization method.

Abstract

Deep neural networks (DNNs) are quantized for efficient inference on resource-constrained platforms. However, training deep learning models with low-precision weights and activations involves a demanding optimization task, which calls for minimizing a stage-wise loss function subject to a discrete set-constraint. While numerous training methods have been proposed, existing studies for full quantization of DNNs are mostly empirical. From a theoretical point of view, we study practical techniques for overcoming the combinatorial nature of network quantization. Specifically, we investigate a simple yet powerful projected gradient-like algorithm for quantizing two-linear-layer networks, which proceeds by repeatedly moving one step at float weights in the negation of a heuristic \emph{fake} gradient of the loss function (so-called coarse gradient) evaluated at quantized weights. For the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques