Delay-adaptive step-sizes for asynchronous learning
Xuyang Wu, Sindri Magnusson, Hamid Reza Feyzmahdavian, Mikael, Johansson

TL;DR
This paper introduces delay-adaptive step-size strategies for asynchronous machine learning algorithms, allowing for faster convergence by adjusting to real-time delays rather than fixed bounds.
Contribution
It develops convergence theory for delay-adaptive step sizes and demonstrates their practical benefits over traditional fixed-delay methods.
Findings
Delay-adaptive step sizes improve convergence speed.
On-line delay measurement enables real-time adjustment.
Delay-adaptive methods outperform fixed-delay algorithms in experiments.
Abstract
In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques · Analog and Mixed-Signal Circuit Design
