Accelerating LMO-Based Optimization via Implicit Gradient Transport
Won-Jun Jang, Si-Hyeon Lee

TL;DR
This paper introduces LMO-IGT, a stochastic LMO-based optimization method using implicit gradient transport, which accelerates convergence with minimal additional computational cost.
Contribution
The paper proposes a unified framework for stochastic LMO-based optimization with implicit gradient transport, achieving improved convergence rates and practical acceleration.
Findings
LMO-IGT achieves an iteration complexity of O(ε^{-3.5}) with a single stochastic gradient per iteration.
Empirical results show LMO-IGT outperforms stochastic LMO with negligible overhead.
Muon-IGT demonstrates strong overall performance across various settings.
Abstract
Recent optimizers such as Lion and Muon have demonstrated strong empirical performance by normalizing gradient momentum via linear minimization oracles (LMOs). While variance reduction has been explored to accelerate LMO-based methods, it typically incurs substantial computational overhead due to additional gradient evaluations. At the same time, the theoretical understanding of LMO-based methods remains fragmented across unconstrained and constrained formulations. Motivated by these limitations, we propose \emph{LMO-IGT}, a new class of stochastic LMO-based methods leveraging implicit gradient transport (IGT). We further introduce a unified framework for stochastic LMO-based optimization together with a new stationarity measure, the \emph{regularized support function} (RSF), which bridges gradient-norm and Frank--Wolfe-gap notions within a common framework. By evaluating stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
