SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J., Reddi, Sebastian U. Stich, Ananda Theertha Suresh

TL;DR
This paper analyzes federated averaging's limitations under data heterogeneity, introduces SCAFFOLD with control variates to improve convergence, and demonstrates its efficiency and robustness through theoretical and empirical results.
Contribution
It provides tight convergence rates for FedAvg, identifies client-drift issues, and proposes SCAFFOLD, a novel algorithm that reduces communication and handles data heterogeneity effectively.
Findings
SCAFFOLD requires fewer communication rounds than FedAvg.
SCAFFOLD is unaffected by data heterogeneity and client sampling.
For quadratic problems, SCAFFOLD leverages data similarity for faster convergence.
Abstract
Federated Averaging (FedAvg) has emerged as the algorithm of choice for federated learning due to its simplicity and low communication cost. However, in spite of recent research efforts, its performance is not fully understood. We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence. As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client-drift' in its local updates. We prove that SCAFFOLD requires significantly fewer communication rounds and is not affected by data heterogeneity or client sampling. Further, we show that (for quadratics) SCAFFOLD can take advantage of similarity in the client's data yielding even faster convergence. The latter is the first result to quantify the usefulness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security
