Ordered Local Momentum for Asynchronous Distributed Learning under Arbitrary Delays
Chang-Wei Shi, Shi-Shang Wang, Wu-Jun Li

TL;DR
This paper introduces OrLoMo, a novel asynchronous distributed optimization method that combines local momentum updates with ordered aggregation, improving convergence and performance in large-scale deep learning.
Contribution
OrLoMo is the first method to implement asynchronous distributed MSGD with local updates, providing convergence guarantees under arbitrary delays.
Findings
OrLoMo outperforms synchronous and other asynchronous methods in experiments.
The method converges for non-convex problems despite arbitrary delays.
Ordered aggregation of local momentum enhances training efficiency.
Abstract
Momentum SGD (MSGD) serves as a foundational optimizer in training deep models due to momentum's key role in accelerating convergence and enhancing generalization. Meanwhile, asynchronous distributed learning is crucial for training large-scale deep models, especially when the computing capabilities of the workers in the cluster are heterogeneous. To reduce communication frequency, local updates are widely adopted in distributed learning. However, how to implement asynchronous distributed MSGD with local updates remains unexplored. To solve this problem, we propose a novel method, called \underline{or}dered \underline{lo}cal \underline{mo}mentum (OrLoMo), for asynchronous distributed learning. In OrLoMo, each worker runs MSGD locally. Then the local momentum from each worker will be aggregated by the server in order based on its global iteration index. To the best of our knowledge,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Graph Neural Networks
