Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

Abdurakhmon Sadiev; Artavazd Maranjyan; Ivan Ilin; Peter Richt\'arik

arXiv:2605.18174·cs.LG·May 19, 2026

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

Abdurakhmon Sadiev, Artavazd Maranjyan, Ivan Ilin, Peter Richt\'arik

PDF

TL;DR

This paper introduces Ringmaster LMO, an asynchronous momentum method for LMO-based optimization that improves efficiency in heterogeneous distributed systems, with proven convergence and superior experimental performance.

Contribution

It extends the delay-thresholding idea to LMO-based methods, providing convergence guarantees and practical advantages in heterogeneous environments.

Findings

01

Achieves optimal time complexity in heterogeneous settings.

02

Outperforms synchronous and asynchronous baselines in experiments.

03

Benefits increase with system heterogeneity.

Abstract

Muon has recently emerged as a strong alternative to AdamW for training neural networks, with encouraging large-scale pretraining results and growing evidence that matrix-structured updates can be faster in practice. Yet Muon, and more generally Linear Minimization Oracle (LMO) based methods, are typically used synchronously. This is problematic in heterogeneous distributed systems, where workers complete gradient computations at different speeds and synchronous training must repeatedly wait for slower workers. In this work, we introduce Ringmaster LMO, an asynchronous LMO-based momentum method for unconstrained stochastic nonconvex optimization. Our method builds on the delay-thresholding idea of Ringmaster ASGD. For SGD-type methods, Ringmaster ASGD achieves optimal time complexity by discarding overly stale gradients. Ringmaster LMO extends this mechanism to general LMO-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.