Decentralized Learning with Lazy and Approximate Dual Gradients
Yanli Liu, Yuejiao Sun, Wotao Yin

TL;DR
This paper introduces new decentralized learning algorithms that reduce both communication and computation costs by using lazy and approximate dual gradients, leveraging stochastic gradients and local rules for efficiency.
Contribution
The paper proposes simple, effective algorithms that improve upon SSDA and MSDA by reducing communication and computation through lazy updates and approximate dual gradients.
Findings
Significant reduction in communication costs.
Notable decrease in computational complexity.
Algorithms outperform state-of-the-art in experiments.
Abstract
This paper develops algorithms for decentralized machine learning over a network, where data are distributed, computation is localized, and communication is restricted between neighbors. A line of recent research in this area focuses on improving both computation and communication complexities. The methods SSDA and MSDA \cite{scaman2017optimal} have optimal communication complexity when the objective is smooth and strongly convex, and are simple to derive. However, they require solving a subproblem at each step. We propose new algorithms that save computation through using (stochastic) gradients and saves communications when previous information is sufficiently useful. Our methods remain relatively simple -- rather than solving a subproblem, they run Katyusha for a small, fixed number of steps from the latest point. An easy-to-compute, local rule is used to decide if a worker can skip a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
