On the Convergence of Stochastic Gradient Descent in Low-precision   Number Formats

Matteo Cacciola; Antonio Frangioni; Masoud Asgharian; Alireza; Ghaffari; Vahid Partovi Nia

arXiv:2301.01651·cs.LG·January 10, 2023

On the Convergence of Stochastic Gradient Descent in Low-precision Number Formats

Matteo Cacciola, Antonio Frangioni, Masoud Asgharian, Alireza, Ghaffari, Vahid Partovi Nia

PDF

Open Access

TL;DR

This paper analyzes how low-precision number formats impact the convergence of stochastic gradient descent (SGD), providing bounds that inform the feasibility of training deep learning models under limited numerical precision.

Contribution

It offers the first deterministic and stochastic convergence bounds for SGD in low-precision formats, extending theoretical understanding to practical low-precision training scenarios.

Findings

01

Bounds quantify the impact of low-precision formats on SGD convergence

02

Numerical errors increase with lower precision, affecting convergence rates

03

Guidelines for using low-precision in training deep learning models

Abstract

Deep learning models are dominating almost all artificial intelligence tasks such as vision, text, and speech processing. Stochastic Gradient Descent (SGD) is the main tool for training such models, where the computations are usually performed in single-precision floating-point number format. The convergence of single-precision SGD is normally aligned with the theoretical results of real numbers since they exhibit negligible error. However, the numerical error increases when the computations are performed in low-precision number formats. This provides compelling reasons to study the SGD convergence adapted for low-precision computations. We present both deterministic and stochastic analysis of the SGD algorithm, obtaining bounds that show the effect of number format. Such bounds can provide guidelines as to how SGD convergence is affected when constraints render the possibility of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Stochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques

MethodsStochastic Gradient Descent