Scaling Law Analysis in Federated Learning: How to Select the Optimal Model Size?

Xuanyu Chen; Nan Yang; Shuai Wang; Dong Yuan

arXiv:2511.12188·cs.LG·November 18, 2025

Scaling Law Analysis in Federated Learning: How to Select the Optimal Model Size?

Xuanyu Chen, Nan Yang, Shuai Wang, Dong Yuan

PDF

Open Access

TL;DR

This paper analyzes how to select the optimal model size in federated learning by deriving a theoretical bound on generalization error and validating it through extensive experiments, addressing the challenges of scaling large models in decentralized data settings.

Contribution

It introduces a PAC-Bayes bound for federated learning, providing a theoretical framework to determine optimal model size based on client number and compute, filling a key research gap.

Findings

01

Optimal model size decreases with more clients if total compute is fixed.

02

Switching to federated learning reduces the maximum achievable generalization performance.

03

Estimating optimal model size should consider average training compute across clients.

Abstract

The recent success of large language models (LLMs) has sparked a growing interest in training large-scale models. As the model size continues to scale, concerns are growing about the depletion of high-quality, well-curated training data. This has led practitioners to explore training approaches like Federated Learning (FL), which can leverage the abundant data on edge devices while maintaining privacy. However, the decentralization of training datasets in FL introduces challenges to scaling large models, a topic that remains under-explored. This paper fills this gap and provides qualitative insights on generalizing the previous model scaling experience to federated learning scenarios. Specifically, we derive a PAC-Bayes (Probably Approximately Correct Bayesian) upper bound for the generalization error of models trained with stochastic algorithms in federated settings and quantify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Big Data and Digital Economy · Mobile Crowdsensing and Crowdsourcing