Ordered Local Momentum for Asynchronous Distributed Learning under Arbitrary Delays

Chang-Wei Shi; Shi-Shang Wang; Wu-Jun Li

arXiv:2601.12322·cs.LG·January 21, 2026

Ordered Local Momentum for Asynchronous Distributed Learning under Arbitrary Delays

Chang-Wei Shi, Shi-Shang Wang, Wu-Jun Li

PDF

Open Access 1 Video

TL;DR

This paper introduces OrLoMo, a novel asynchronous distributed optimization method that combines local momentum updates with ordered aggregation, improving convergence and performance in large-scale deep learning.

Contribution

OrLoMo is the first method to implement asynchronous distributed MSGD with local updates, providing convergence guarantees under arbitrary delays.

Findings

01

OrLoMo outperforms synchronous and other asynchronous methods in experiments.

02

The method converges for non-convex problems despite arbitrary delays.

03

Ordered aggregation of local momentum enhances training efficiency.

Abstract

Momentum SGD (MSGD) serves as a foundational optimizer in training deep models due to momentum's key role in accelerating convergence and enhancing generalization. Meanwhile, asynchronous distributed learning is crucial for training large-scale deep models, especially when the computing capabilities of the workers in the cluster are heterogeneous. To reduce communication frequency, local updates are widely adopted in distributed learning. However, how to implement asynchronous distributed MSGD with local updates remains unexplored. To solve this problem, we propose a novel method, called \underline{or}dered \underline{lo}cal \underline{mo}mentum (OrLoMo), for asynchronous distributed learning. In OrLoMo, each worker runs MSGD locally. Then the local momentum from each worker will be aggregated by the server in order based on its global iteration index. To the best of our knowledge,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Ordered Local Momentum for Asynchronous Distributed Learning Under Arbitrary Delays· underline

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Graph Neural Networks