DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization
Adel Nabli (MLIA, ISIR, MILA), Edouard Oyallon (MLIA, ISIR)

TL;DR
This paper introduces DADAO, a novel decentralized asynchronous optimization algorithm that accelerates both computation and communication by decoupling steps and modeling updates with Poisson processes, improving over existing methods.
Contribution
The work presents the first asynchronous decentralized primal first-order method with acceleration, avoiding multi-consensus loops and ad-hoc mechanisms, and provides theoretical complexity bounds.
Findings
Achieves accelerated convergence rates for both computation and communication.
Requires fewer local gradients and communications to reach a given accuracy.
Validated through simulations demonstrating improved performance.
Abstract
This work introduces DADAO: the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of -smooth and -strongly convex functions distributed over a given network of size . Our key insight is based on modeling the local gradient updates and gossip communication procedures with separate independent Poisson Point Processes. This allows us to decouple the computation and communication steps, which can be run in parallel, while making the whole approach completely asynchronous. This leads to communication acceleration compared to synchronous approaches. Our new method employs primal gradients and does not use a multi-consensus inner loop nor other ad-hoc mechanisms such as Error Feedback, Gradient Tracking, or a Proximal operator. By relating the inverse of the smallest positive eigenvalue of the Laplacian matrix and the maximal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Privacy-Preserving Technologies in Data
MethodsStochastic Gradient Descent
