When More Parameters Hurt: Foundation Model Priors Amplify Worst-Client Disparity Under Extreme Federated Heterogeneity
Kiran Naseer, Umar Shoaib

TL;DR
This paper reveals that large foundation models can worsen disparities among clients in federated learning under extreme heterogeneity, challenging assumptions of their universal benefit.
Contribution
It demonstrates that powerful priors in foundation models can harm disadvantaged clients in federated settings, especially under extreme heterogeneity, and evaluates aggregation methods.
Findings
DistilBERT+LoRA increases worst-client accuracy gap by 56% under extreme heterogeneity.
Under moderate heterogeneity, the gap nearly disappears, reversing the pattern.
Inverse-weighted LoRA aggregation does not resolve client disparity.
Abstract
Federated learning (FL) is increasingly used to fine-tune foundation models (FMs) on distributed private data. The community largely assumes that large-scale pretraining serves as a 'rising tide that lifts all boats' in federated settings. However, our experiments reveal that these powerful priors can hinder rather than help the most disadvantaged clients under extreme heterogeneity. Through controlled experiments on federated text classification, we compare worst-client accuracy between TextCNN (2.7M parameters) and DistilBERT with Low-Rank Adaptation (LoRA, 66M parameters) across four Non-IID heterogeneity levels. Under extreme label skew (alpha = 0.1), DistilBERT+LoRA produces a worst-client accuracy gap of 50.1% -- 56% larger than TextCNN's 32.2% gap, despite having 25x more parameters and extensive pretraining. Under moderate heterogeneity (alpha >= 0.5), the pattern reverses: the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
