On the Convergence of the Gradient Descent Method with Stochastic   Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

Lu Xia; Michiel E. Hochstenbach; Stefano Massei

arXiv:2301.09511·stat.ML·January 22, 2025

On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

Lu Xia, Michiel E. Hochstenbach, Stefano Massei

PDF

Open Access

TL;DR

This paper analyzes how stochastic rounding errors affect the convergence of gradient descent in neural network training under the Polyak-Lojasiewicz condition, revealing that biased errors can be beneficial and providing convergence bounds.

Contribution

It introduces a theoretical framework showing biased stochastic rounding can improve convergence and derives stricter convergence bounds for low-precision optimization.

Findings

01

Biased stochastic rounding can eliminate the vanishing gradient problem.

02

Proper rounding strategies can enhance convergence in low-precision training.

03

Experimental validation compares various rounding strategies on multiple examples.

Abstract

When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-\Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding errors may be beneficial since choosing a proper rounding strategy eliminates the vanishing gradient problem and forces the rounding bias in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point number formats.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Numerical Analysis Techniques · Stochastic Gradient Optimization Techniques · 3D Shape Modeling and Analysis