Initialization Matters: Unraveling the Impact of Pre-Training on   Federated Learning

Divyansh Jhunjhunwala; Pranay Sharma; Zheng Xu; Gauri Joshi

arXiv:2502.08024·cs.LG·February 13, 2025

Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning

Divyansh Jhunjhunwala, Pranay Sharma, Zheng Xu, Gauri Joshi

PDF

Open Access

TL;DR

Pre-trained models in federated learning reduce data heterogeneity effects by initializing with aligned filters, leading to better test performance, as shown through theoretical analysis and experiments.

Contribution

This work provides a theoretical analysis of how pre-training influences filter alignment and improves federated learning performance under data heterogeneity.

Findings

01

Pre-trained models lead to fewer misaligned filters at initialization.

02

Data heterogeneity mainly affects misaligned filters during training.

03

Experiments confirm theoretical predictions in synthetic and real FL settings.

Abstract

Initializing with pre-trained models when learning on downstream tasks is becoming standard practice in machine learning. Several recent works explore the benefits of pre-trained initialization in a federated learning (FL) setting, where the downstream training is performed at the edge clients with heterogeneous data distribution. These works show that starting from a pre-trained model can substantially reduce the adverse impact of data heterogeneity on the test performance of a model trained in a federated setting, with no changes to the standard FedAvg training algorithm. In this work, we provide a deeper theoretical understanding of this phenomenon. To do so, we study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with FedAvg. We introduce the notion of aligned and misaligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Statistical Modeling Techniques