Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD
Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji

TL;DR
This paper provides a theoretical analysis of hierarchical SGD, revealing how local aggregation improves convergence through a novel divergence framework, applicable to multi-level communication networks with non-IID data and non-convex objectives.
Contribution
It introduces the concepts of upward and downward divergences and analyzes the convergence of multi-level H-SGD, explaining the benefits of local aggregation in complex distributed settings.
Findings
Convergence upper bound for two-level H-SGD with non-IID data and non-convex objectives.
Identification of 'sandwich behavior' where H-SGD's convergence bounds lie between single-level SGD bounds.
Extension of analysis to multi-level H-SGD confirming the persistence of the 'sandwich behavior'.
Abstract
Hierarchical SGD (H-SGD) has emerged as a new distributed SGD algorithm for multi-level communication networks. In H-SGD, before each global aggregation, workers send their updated local models to local servers for aggregations. Despite recent research efforts, the effect of local aggregation on global convergence still lacks theoretical understanding. In this work, we first introduce a new notion of "upward" and "downward" divergences. We then use it to conduct a novel analysis to obtain a worst-case convergence upper bound for two-level H-SGD with non-IID data, non-convex objective function, and stochastic gradient. By extending this result to the case with random grouping, we observe that this convergence upper bound of H-SGD is between the upper bounds of two single-level local SGD settings, with the number of local iterations equal to the local and global update periods in H-SGD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEnergy Efficient Wireless Sensor Networks · Distributed Control Multi-Agent Systems · Image and Video Quality Assessment
