On the Convergence of Nested Decentralized Gradient Methods with Multiple Consensus and Gradient Steps
Albert S. Berahas, Raghu Bollapragada, Ermin Wei

TL;DR
This paper analyzes a distributed optimization algorithm that performs multiple gradient and consensus steps per iteration, demonstrating improved convergence properties and efficiency in minimizing sums of local convex functions.
Contribution
It generalizes the NEAR-DGD algorithm to include multiple gradient steps, providing convergence analysis and practical insights into efficiency trade-offs.
Findings
Multiple gradient steps improve convergence rate.
Fixed gradient steps with increasing consensus steps achieve R-Linear convergence.
Experimental results show reduced communication and computation costs.
Abstract
In this paper, we consider minimizing a sum of local convex objective functions in a distributed setting, where the cost of communication and/or computation can be expensive. We extend and generalize the analysis for a class of nested gradient-based distributed algorithms (NEAR-DGD; Berahas, Bollapragada, Keskar and Wei, 2018) to account for multiple gradient steps at every iteration. We show the effect of performing multiple gradient steps on the rate of convergence and on the size of the neighborhood of convergence, and prove R-Linear convergence to the exact solution with a fixed number of gradient steps and increasing number of consensus steps. We test the performance of the generalized method on quadratic functions and show the effect of multiple consensus and gradient steps in terms of iterations, number of gradient evaluations, number of communications and cost.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
