On the Convergence of Stochastic Gradient Descent in Low-precision Number Formats
Matteo Cacciola, Antonio Frangioni, Masoud Asgharian, Alireza, Ghaffari, Vahid Partovi Nia

TL;DR
This paper analyzes how low-precision number formats impact the convergence of stochastic gradient descent (SGD), providing bounds that inform the feasibility of training deep learning models under limited numerical precision.
Contribution
It offers the first deterministic and stochastic convergence bounds for SGD in low-precision formats, extending theoretical understanding to practical low-precision training scenarios.
Findings
Bounds quantify the impact of low-precision formats on SGD convergence
Numerical errors increase with lower precision, affecting convergence rates
Guidelines for using low-precision in training deep learning models
Abstract
Deep learning models are dominating almost all artificial intelligence tasks such as vision, text, and speech processing. Stochastic Gradient Descent (SGD) is the main tool for training such models, where the computations are usually performed in single-precision floating-point number format. The convergence of single-precision SGD is normally aligned with the theoretical results of real numbers since they exhibit negligible error. However, the numerical error increases when the computations are performed in low-precision number formats. This provides compelling reasons to study the SGD convergence adapted for low-precision computations. We present both deterministic and stochastic analysis of the SGD algorithm, obtaining bounds that show the effect of number format. Such bounds can provide guidelines as to how SGD convergence is affected when constraints render the possibility of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Stochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques
MethodsStochastic Gradient Descent
