FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models

Junkang Liu; Fanhua Shang; Hongying Liu; Yuxuan Tian; Yuanyuan Liu; Jin Liu; Kewen Zhu; Zhouchen Lin

arXiv:2510.27486·cs.LG·April 21, 2026

FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models

Junkang Liu, Fanhua Shang, Hongying Liu, Yuxuan Tian, Yuanyuan Liu, Jin Liu, Kewen Zhu, Zhouchen Lin

PDF

1 Repo

TL;DR

FedAdamW is a novel federated optimizer that improves training efficiency and model performance for large-scale models by addressing variance, overfitting, and convergence issues with theoretical guarantees.

Contribution

The paper introduces FedAdamW, the first federated AdamW variant with variance reduction, local correction, and convergence guarantees, tailored for large models.

Findings

01

Reduces communication rounds significantly.

02

Improves test accuracy over baselines.

03

Validates effectiveness on language and vision models.

Abstract

AdamW has become one of the most effective optimizers for training large-scale models. We have also observed its effectiveness in the context of federated learning (FL). However, directly applying AdamW in federated learning settings poses significant challenges: (1) due to data heterogeneity, AdamW often yields high variance in the second-moment estimate $v$ ; (2) the local overfitting of AdamW may cause client drift; and (3) Reinitializing moment estimates ( $v$ , $m$ ) at each round slows down convergence. To address these challenges, we propose the first \underline{Fed}erated \underline{AdamW} algorithm, called \texttt{FedAdamW}, for training and fine-tuning various large models. \texttt{FedAdamW} aligns local updates with the global update using both a \textbf{local correction mechanism} and decoupled weight decay to mitigate local overfitting.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

junkangLiu0/FedAdamW
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.