Optimizing Stochastic Gradient Push under Broadcast Communications
Tuan Nguyen, Ting He

TL;DR
This paper develops an optimized mixing matrix design for stochastic gradient push in decentralized federated learning over wireless networks, reducing convergence time by leveraging directed communication graphs.
Contribution
It introduces a novel mixing matrix design algorithm for SGP that allows asymmetric matrices and directed graphs, improving convergence time in DFL.
Findings
Proposed design reduces convergence time significantly.
Allows asymmetric mixing matrices for directed graphs.
Achieves better performance without sacrificing model quality.
Abstract
We consider the problem of minimizing the convergence time for decentralized federated learning (DFL) in wireless networks under broadcast communications, with focus on mixing matrix design. The mixing matrix is a critical hyperparameter for DFL that simultaneously controls the convergence rate across iterations and the communication demand per iteration, both strongly influencing the convergence time. Although the problem has been studied previously, existing solutions are mostly designed for decentralized parallel stochastic gradient descent (D-PSGD), which requires the mixing matrix to be symmetric and doubly stochastic. These constraints confine the activated communication graph to undirected (i.e., bidirected) graphs, which limits design flexibility. In contrast, we consider mixing matrix design for stochastic gradient push (SGP), which allows asymmetric mixing matrices and hence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
