FedAvg with Fine Tuning: Local Updates Lead to Representation Learning
Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

TL;DR
This paper investigates why FedAvg, a popular federated learning algorithm, generalizes well after fine-tuning by learning shared data representations, supported by theoretical analysis and empirical evidence.
Contribution
It provides the first theoretical analysis showing FedAvg's ability to learn common data representations in a multi-task linear setting, explaining its generalization performance.
Findings
FedAvg learns shared data representations among clients.
Theoretical bounds on iteration complexity for representation learning.
Empirical evidence in federated image classification tasks.
Abstract
The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the output model of FedAvg, after a few fine-tuning steps, leads to a model that generalizes well to new unseen tasks. This surprising performance of such a simple method, however, is not fully understood from a theoretical point of view. In this paper, we formally investigate this phenomenon in the multi-task linear representation setting. We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks, by leveraging the diversity among client data distributions via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
