DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models

Jin Liu; Yinbin Miao; Ning Xi; Junkang Liu

arXiv:2602.19945·cs.LG·April 21, 2026

DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models

Jin Liu, Yinbin Miao, Ning Xi, Junkang Liu

PDF

TL;DR

This paper introduces DP-FedAdamW, an optimizer designed for differentially private federated learning, addressing variance, bias, and client drift issues to improve convergence and privacy guarantees.

Contribution

It proposes the first AdamW-based optimizer for DP federated learning, with theoretical convergence guarantees and empirical improvements on vision and language models.

Findings

01

Outperforms SOTA by 5.83% on Tiny-ImageNet with ε=1.

02

Establishes an unbiased second-moment estimator under DP.

03

Proves linearly accelerated convergence without heterogeneity assumptions.

Abstract

Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.