Large deviations rates for stochastic gradient descent with strongly convex functions
Dragana Bajovic, Dusan Jakovetic, Soummya Kar

TL;DR
This paper develops a large deviations framework for analyzing stochastic gradient descent (SGD) with strongly convex functions, providing new insights into high probability bounds and the influence of noise distribution beyond variance.
Contribution
It introduces a formal large deviations approach for SGD, capturing the impact of higher order noise moments and geometry, and derives tight bounds including for quadratic objectives.
Findings
Derived upper large deviations bounds for SGD with strongly convex functions.
Established exact large deviation rates for quadratic objectives.
Numerical results support theoretical bounds and insights.
Abstract
Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily bounded) gradient noise satisfying mild technical assumptions, allowing for the dependence of the noise distribution on the current iterate. Under the preceding assumptions, we find an upper large deviations bound for SGD with strongly convex functions. The corresponding rate function captures analytical dependence on the noise distribution and other problem parameters. This is in contrast with conventional mean-square error analysis that captures only the noise dependence through the variance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Statistical Methods and Inference · Stochastic processes and financial applications
MethodsStochastic Gradient Descent
