Tighter Analysis for Decentralized Stochastic Gradient Method: Impact of Data Homogeneity
Qiang Li, Hoi-To Wai

TL;DR
This paper provides a refined convergence analysis of decentralized stochastic gradient methods, highlighting how data homogeneity influences the transient time and overall efficiency of the algorithm.
Contribution
It introduces a new analysis framework based on Hessian similarity, offering explicit bounds on convergence related to data homogeneity and network properties.
Findings
Transient time can be as small as ${ m O}(n^{2/3}/ ho^{8/3})$ for smooth objectives.
Transient time can be as small as ${ m O}(rac{ ext{sqrt}(n)}{ ho})$ for strongly convex objectives.
Analysis relies on higher-order Taylor approximation for gradient maps.
Abstract
This paper studies the effect of data homogeneity on multi-agent stochastic optimization. We consider the decentralized stochastic gradient (DSGD) algorithm and perform a refined convergence analysis. Our analysis is explicit on the similarity between Hessian matrices of local objective functions which captures the degree of data homogeneity. We illustrate the impact of our analysis through studying the transient time, defined as the minimum number of iterations required for a distributed algorithm to achieve comparable performance as its centralized counterpart. When the local objective functions have similar Hessian, the transient time of DSGD can be as small as for smooth (possibly non-convex) objective functions, for strongly convex objective functions, where is the number of agents and is the spectral gap of graph.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Stochastic Gradient Optimization Techniques · Traffic Prediction and Management Techniques
