Anytime Minibatch with Delayed Gradients
Haider Al-Lawati, Stark C. Draper

TL;DR
This paper introduces AMB-DG, a distributed optimization method that effectively uses stale gradients with a variable minibatch scheme, achieving optimal convergence rates and faster wall clock convergence in distributed settings.
Contribution
The paper proposes AMB-DG, a novel asynchronous distributed optimization algorithm that leverages delayed gradients with a variable minibatch approach, providing theoretical guarantees and empirical improvements.
Findings
AMB-DG achieves optimal regret bounds for convex smooth functions.
AMB-DG converges faster than AMB and fixed minibatch methods in experiments.
AMB-DG reduces idle time and improves wall clock time convergence in distributed systems.
Abstract
Distributed optimization is widely deployed in practice to solve a broad range of problems. In a typical asynchronous scheme, workers calculate gradients with respect to out-of-date optimization parameters while the master uses stale (i.e., delayed) gradients to update the parameters. While using stale gradients can slow the convergence, asynchronous methods speed up the overall optimization with respect to wall clock time by allowing more frequent updates and reducing idling times. In this paper, we present a variable per-epoch minibatch scheme called Anytime Minibatch with Delayed Gradients (AMB-DG). In AMB-DG, workers compute gradients in epochs of a fixed time while the master uses stale gradients to update the optimization parameters. We analyze AMB-DG in terms of its regret bound and convergence rate. We prove that for convex smooth objective functions, AMB-DG achieves the optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
