TL;DR
This paper establishes fundamental limitations on the scalability of centralized distributed optimization in federated learning, showing that communication and variance constraints prevent significant improvements beyond poly-logarithmic speedup with increasing workers.
Contribution
The authors introduce a new lower bound framework and a worst-case function to prove inherent scalability limits in distributed optimization with unbiased sparsification.
Findings
Communication from server to workers limits scalability improvements.
Variance reduction techniques cannot significantly outperform poly-logarithmic scaling.
New lower bound framework and concentration bounds underpin the theoretical results.
Abstract
We consider centralized distributed optimization in the classical federated learning setup, where workers jointly find an -stationary point of an -smooth, -dimensional nonconvex function , having access only to unbiased stochastic gradients with variance . Each worker requires at most seconds to compute a stochastic gradient, and the communication times from the server to the workers and from the workers to the server are and seconds per coordinate, respectively. One of the main motivations for distributed optimization is to achieve scalability with respect to . For instance, it is well known that the distributed version of SGD has a variance-dependent runtime term which improves with the number of workers where and is the starting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
