Large Deviation Upper Bounds and Improved MSE Rates of Nonlinear SGD: Heavy-tailed Noise and Power of Symmetry
Aleksandar Armacki, Shuhua Yu, Dragana Bajovic, Dusan Jakovetic,, Soummya Kar

TL;DR
This paper develops a unified theoretical framework for nonlinear stochastic gradient methods under heavy-tailed noise, providing new large deviation bounds and improved mean-squared error rates for both convex and non-convex optimization.
Contribution
It introduces a black-box approach to analyze nonlinearities in SGD, deriving explicit large deviation bounds and near-optimal MSE rates under heavy-tailed noise.
Findings
Large deviation upper bounds with exponential tail decay.
Optimal MSE rate of rac{1}{2} for non-convex costs.
Near-optimal rac{1}{t} MSE rate for strongly convex costs.
Abstract
We study large deviation upper bounds and mean-squared error (MSE) guarantees of a general framework of nonlinear stochastic gradient methods in the online setting, in the presence of heavy-tailed noise. Unlike existing works that rely on the closed form of a nonlinearity (typically clipping), our framework treats the nonlinearity in a black-box manner, allowing us to provide unified guarantees for a broad class of bounded nonlinearities, including many popular ones, like sign, quantization, normalization, as well as component-wise and joint clipping. We provide several strong results for a broad range of step-sizes in the presence of heavy-tailed noise with symmetric probability density function, positive in a neighbourhood of zero and potentially unbounded moments. In particular, for non-convex costs we provide a large deviation upper bound for the minimum norm-squared of gradients,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Laser Applications · Semiconductor Lasers and Optical Devices · Stochastic processes and financial applications
