Adaptive Federated Optimization
Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith, Rush, Jakub Kone\v{c}n\'y, Sanjiv Kumar, H. Brendan McMahan

TL;DR
This paper introduces federated versions of adaptive optimizers like Adagrad, Adam, and Yogi, analyzing their convergence and demonstrating their effectiveness in improving federated learning performance with heterogeneous data.
Contribution
It presents the first adaptation and analysis of adaptive optimizers for federated learning, addressing convergence issues under data heterogeneity.
Findings
Adaptive federated optimizers improve convergence.
Adaptive methods outperform standard FedAvg.
Heterogeneity impacts communication efficiency.
Abstract
Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have had notable success in combating such issues. In this work, we propose federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, and analyze their convergence in the presence of heterogeneous data for general non-convex settings. Our results highlight the interplay between client heterogeneity and communication efficiency. We also perform extensive experiments on these methods and show that the use of adaptive optimizers can significantly improve the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
MethodsAdam
