First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data
Artavazd Maranjyan

TL;DR
This paper introduces a rigorous framework for asynchronous stochastic gradient descent (SGD) that achieves optimal time complexity in both homogeneous and heterogeneous data settings, addressing the challenges of stale updates and system variability.
Contribution
It presents novel algorithms, Ringmaster ASGD and Ringleader ASGD, that attain optimal time complexity by managing stale updates and data heterogeneity, advancing asynchronous optimization theory.
Findings
Achieves optimal time complexity in homogeneous data setting.
Extends optimality to heterogeneous data in federated learning.
Improves resource efficiency with adaptive worker task allocation.
Abstract
Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy resources. Yet the optimization algorithms behind these runs have not kept pace. Most large scale training still relies on synchronous methods, where workers must wait for the slowest device, wasting compute and amplifying the effects of hardware and network variability. Removing synchronization seems like a simple fix, but asynchrony introduces staleness, meaning updates computed on outdated models. This makes analysis difficult, especially when delays arise from system level randomness rather than algorithmic choices. As a result, the time complexity of asynchronous methods remains poorly understood. This dissertation develops a rigorous framework for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Cloud Computing and Resource Management · Advanced Memory and Neural Computing
