Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method
Abdurakhmon Sadiev, Artavazd Maranjyan, Ivan Ilin, Peter Richt\'arik

TL;DR
This paper introduces Ringmaster LMO, an asynchronous momentum method for LMO-based optimization that improves efficiency in heterogeneous distributed systems, with proven convergence and superior experimental performance.
Contribution
It extends the delay-thresholding idea to LMO-based methods, providing convergence guarantees and practical advantages in heterogeneous environments.
Findings
Achieves optimal time complexity in heterogeneous settings.
Outperforms synchronous and asynchronous baselines in experiments.
Benefits increase with system heterogeneity.
Abstract
Muon has recently emerged as a strong alternative to AdamW for training neural networks, with encouraging large-scale pretraining results and growing evidence that matrix-structured updates can be faster in practice. Yet Muon, and more generally Linear Minimization Oracle (LMO) based methods, are typically used synchronously. This is problematic in heterogeneous distributed systems, where workers complete gradient computations at different speeds and synchronous training must repeatedly wait for slower workers. In this work, we introduce Ringmaster LMO, an asynchronous LMO-based momentum method for unconstrained stochastic nonconvex optimization. Our method builds on the delay-thresholding idea of Ringmaster ASGD. For SGD-type methods, Ringmaster ASGD achieves optimal time complexity by discarding overly stale gradients. Ringmaster LMO extends this mechanism to general LMO-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
