On the Convergence of FedAvg on Non-IID Data

Xiang Li; Kaixuan Huang; Wenhao Yang; Shusen Wang; Zhihua Zhang

arXiv:1907.02189·stat.ML·June 26, 2020·1.0k cites

On the Convergence of FedAvg on Non-IID Data

Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang

PDF

Open Access 2 Repos

TL;DR

This paper provides a theoretical analysis of FedAvg's convergence on non-iid data, establishing rates, trade-offs, and conditions for effective federated learning in realistic, heterogeneous environments.

Contribution

It offers the first convergence rate analysis of FedAvg on non-iid data, including partial participation, heterogeneity effects, and necessary learning rate decay conditions.

Findings

01

Convergence rate of O(1/T) for strongly convex, smooth problems.

02

Trade-off between communication efficiency and convergence speed.

03

Heterogeneity slows down convergence, matching empirical observations.

Abstract

Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $O (\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization