Perturbed Iterate SGD for Lipschitz Continuous Loss Functions with Numerical Error and Adaptive Step Sizes
Michael R. Metel

TL;DR
This paper analyzes the convergence of perturbed iterate SGD with adaptive step sizes in finite-precision environments, providing theoretical guarantees for stochastic Lipschitz continuous loss functions despite numerical errors.
Contribution
It introduces convergence results for perturbed iterate SGD with adaptive step sizes under numerical errors for general stochastic Lipschitz continuous loss functions.
Findings
Proves asymptotic convergence to Clarke stationary points.
Establishes non-asymptotic convergence to approximate stationary points.
Handles approximation errors in stochastic gradients and SGD steps.
Abstract
Motivated by neural network training in finite-precision arithmetic environments, this work studies the convergence of perturbed iterate SGD using adaptive step sizes in an environment with numerical error. Considering a general stochastic Lipschitz continuous loss function, an asymptotic convergence result to a Clarke stationary point is proven as well as the non-asymptotic convergence to an approximate stationary point in expectation. It is assumed that only an approximation of the loss function's stochastic gradient can be computed, in addition to error in computing the SGD step itself.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
MethodsTest · Stochastic Gradient Descent
