Constrained Stochastic Spectral Preconditioning Converges for Nonconvex Objectives
Konstantinos Oikonomidis, Jan Quan, Kimon Antonakopoulos, Antonio Silveti-Falls, Volkan Cevher, Panagiotis Patrinos

TL;DR
This paper introduces spectral gradient-based proximal preconditioned methods for nonconvex optimization, providing convergence guarantees under heavy-tailed noise and proposing a variance-reduced variant for improved speed.
Contribution
It develops a new family of stochastic spectral preconditioning algorithms with convergence analysis tailored to nonconvex constraints and heavy-tailed noise.
Findings
Proposed algorithms handle convex and nonconvex constraints effectively.
Variance reduction accelerates convergence under standard noise assumptions.
Polynomial iterations in Muon are better modeled by nonlinear preconditioners.
Abstract
In this work, we develop proximal preconditioned gradient methods with a focus on spectral gradient methods providing a proximal extension to the Muon and Scion optimizers. We introduce a family of stochastic algorithms that can handle a wide variety of convex and nonconvex constraints and study its convergence under heavy-tailed noise, through a novel analysis tailored to the geometry of the proposed methods. We further propose a variance-reduced version, which achieves faster convergence under standard noise assumptions. Finally, we show that the polynomial iterations used in Muon are more accurately captured by a nonlinear preconditioner than by the ideal matrix sign, leading to a convergence analysis that more faithfully reflects practical implementations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
