On the Convergence of FedAvg on Non-IID Data
Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang

TL;DR
This paper provides a theoretical analysis of FedAvg's convergence on non-iid data, establishing rates, trade-offs, and conditions for effective federated learning in realistic, heterogeneous environments.
Contribution
It offers the first convergence rate analysis of FedAvg on non-iid data, including partial participation, heterogeneity effects, and necessary learning rate decay conditions.
Findings
Convergence rate of O(1/T) for strongly convex, smooth problems.
Trade-off between communication efficiency and convergence speed.
Heterogeneity slows down convergence, matching empirical observations.
Abstract
Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of for strongly convex and smooth problems, where is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization
