Improving the Transient Times for Distributed Stochastic Gradient Methods
Kun Huang, Shi Pu

TL;DR
This paper introduces EDAS, a distributed stochastic gradient algorithm that significantly reduces transient time to reach optimal convergence rates in networked optimization, matching centralized SGD performance.
Contribution
The paper proposes EDAS, a novel adaptive stepsize method for distributed stochastic gradient descent that achieves minimal transient time and optimal convergence rates.
Findings
EDAS attains the same asymptotic convergence rate as centralized SGD.
Transient time for EDAS is proportional to n/(1-λ₂), optimizing performance.
Numerical results confirm theoretical transient time and convergence rate improvements.
Abstract
We consider the distributed optimization problem where agents each possessing a local cost function, collaboratively minimize the average of the cost functions over a connected network. Assuming stochastic gradient information is available, we study a distributed stochastic gradient algorithm, called exact diffusion with adaptive stepsizes (EDAS) adapted from the Exact Diffusion method and NIDS and perform a non-asymptotic convergence analysis. We not only show that EDAS asymptotically achieves the same network independent convergence rate as centralized stochastic gradient descent (SGD) for minimizing strongly convex and smooth objective functions, but also characterize the transient time needed for the algorithm to approach the asymptotic convergence rate, which behaves as , where stands for the spectral gap of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Stochastic Gradient Optimization Techniques · Neural Networks Stability and Synchronization
MethodsDiffusion
