Federated Learning of a Mixture of Global and Local Models
Filip Hanzely, Peter Richt\'arik

TL;DR
This paper introduces a new federated learning formulation balancing global and local models, demonstrating that local steps and personalization can improve communication efficiency, especially with heterogeneous data.
Contribution
It proposes a novel optimization framework for federated learning that explicitly trades off between global and local models, with new algorithms and theoretical guarantees.
Findings
Local steps can improve communication efficiency with heterogeneous data.
Personalization reduces communication complexity.
New algorithms with proven communication guarantees.
Abstract
We propose a new optimization formulation for training federated learning models. The standard formulation has the form of an empirical risk minimization problem constructed to find a single global model trained from the private data stored across all participating devices. In contrast, our formulation seeks an explicit trade-off between this traditional global model and the local models, which can be learned by each device from its own private data without any communication. Further, we develop several efficient variants of SGD (with and without partial participation and with and without variance reduction) for solving the new formulation and prove communication complexity guarantees. Notably, our methods are similar but not identical to federated averaging / local SGD, thus shedding some light on the role of local steps in federated learning. In particular, we are the first to i) show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Advanced Graph Neural Networks
MethodsStochastic Gradient Descent
