Non-Euclidean SGD for Structured Optimization: Unified Analysis and Improved Rates
Dmitry Kovalev, Ekaterina Borodich

TL;DR
This paper provides a unified theoretical analysis of non-Euclidean SGD methods, demonstrating their ability to exploit problem structures and match the convergence rates of advanced adaptive algorithms.
Contribution
It develops a new convergence analysis framework for non-Euclidean SGD, explaining their practical success and showing they can outperform Euclidean SGD under certain conditions.
Findings
Non-Euclidean SGD can exploit sparsity and low-rank structures.
It can benefit from extrapolation and momentum variance reduction.
It matches the convergence rates of adaptive algorithms like AdaGrad and Shampoo.
Abstract
Recently, several instances of non-Euclidean SGD, including SignSGD, Lion, and Muon, have attracted significant interest from the optimization community due to their practical success in training deep neural networks. Consequently, a number of works have attempted to explain this success by developing theoretical convergence analyses. Unfortunately, these results cannot properly justify the superior performance of these methods, as they could not beat the convergence rate of vanilla Euclidean SGD. We resolve this important open problem by developing a new unified convergence analysis under the structured smoothness and gradient noise assumption. In particular, our results indicate that non-Euclidean SGD (i) can exploit the sparsity or low-rank structure of the upper bounds on the Hessian and gradient noise, (ii) can provably benefit from popular algorithmic tools such as extrapolation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Neural Network Applications
