On SGD's Failure in Practice: Characterizing and Overcoming Stalling
Vivak Patel

TL;DR
This paper investigates why stochastic gradient descent (SGD) stalls in practice, even in simple settings, and proposes a framework to mitigate stalling while ensuring convergence, making SGD more reliable for empirical risk minimization.
Contribution
It characterizes the pervasive issue of SGD stalling, demonstrates its occurrence beyond conditioning issues, and introduces a generalized framework to prevent stalling with convergence guarantees.
Findings
SGD stalls even in simple linear regression with unity condition number.
Stalling is a fundamental limitation of SGD and its variants in practice.
A new framework effectively deters stalling while maintaining convergence.
Abstract
Stochastic Gradient Descent (SGD) is widely used in machine learning problems to efficiently perform empirical risk minimization, yet, in practice, SGD is known to stall before reaching the actual minimizer of the empirical risk. SGD stalling has often been attributed to its sensitivity to the conditioning of the problem; however, as we demonstrate, SGD will stall even when applied to a simple linear regression problem with unity condition number for standard learning rates. Thus, in this work, we numerically demonstrate and mathematically argue that stalling is a crippling and generic limitation of SGD and its variants in practice. Once we have established the problem of stalling, we generalize an existing framework for hedging against its effects, which (1) deters SGD and its variants from stalling, (2) still provides convergence guarantees, and (3) makes SGD and its variants more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFluid Dynamics and Turbulent Flows · Reinforcement Learning in Robotics · Model Reduction and Neural Networks
MethodsLinear Regression · Stochastic Gradient Descent
