A Sharp Estimate on the Transient Time of Distributed Stochastic   Gradient Descent

Shi Pu; Alex Olshevsky; Ioannis Ch. Paschalidis

arXiv:1906.02702·math.OC·February 2, 2021·33 cites

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

Shi Pu, Alex Olshevsky, Ioannis Ch. Paschalidis

PDF

Open Access

TL;DR

This paper analyzes the transient time for distributed stochastic gradient descent (DSGD) to reach optimal convergence rates in noisy, networked environments, providing sharp bounds that depend on network properties and problem size.

Contribution

The paper characterizes the sharp transient time for DSGD to achieve asymptotic convergence, revealing its dependence on network spectral gap and problem size.

Findings

01

Transient time scales as n/(1-ρ_w)^2

02

Asymptotic convergence rate matches centralized SGD

03

Numerical experiments confirm theoretical bounds

Abstract

This paper is concerned with minimizing the average of $n$ cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate, which we show behaves as $K_{T} = O (\frac{n}{( 1 - ρ _{w} ) ^{2}})$ , where $1 - ρ_{w}$ denotes the spectral gap of the mixing matrix. Moreover, we construct a "hard" optimization problem for which we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Sparse and Compressive Sensing Techniques