A Fast and Flat Federated Learning Method via Weighted Momentum and Sharpness-Aware Minimization

Tianle Li; Yongzhi Huang; Linshan Jiang; Chang Liu; Qipeng Xie; Wenfeng Du; Lu Wang; Kaishun Wu

arXiv:2511.22080·cs.LG·December 1, 2025

A Fast and Flat Federated Learning Method via Weighted Momentum and Sharpness-Aware Minimization

Tianle Li, Yongzhi Huang, Linshan Jiang, Chang Liu, Qipeng Xie, Wenfeng Du, Lu Wang, Kaishun Wu

PDF

Open Access

TL;DR

This paper introduces FedWMSAM, a federated learning method that combines weighted momentum and sharpness-aware minimization to improve convergence speed and generalization in non-IID settings, addressing key structural issues.

Contribution

The paper proposes a novel FedWMSAM algorithm that jointly addresses local-global curvature misalignment and momentum-echo oscillation in federated learning, with theoretical convergence guarantees.

Findings

01

FedWMSAM outperforms existing methods in convergence speed.

02

The method achieves better generalization on non-IID data.

03

Experimental results validate robustness and effectiveness.

Abstract

In federated learning (FL), models must \emph{converge quickly} under tight communication budgets while \emph{generalizing} across non-IID client distributions. These twin requirements have naturally led to two widely used techniques: client/server \emph{momentum} to accelerate progress, and \emph{sharpness-aware minimization} (SAM) to prefer flat solutions. However, simply combining momentum and SAM leaves two structural issues unresolved in non-IID FL. We identify and formalize two failure modes: \emph{local-global curvature misalignment} (local SAM directions need not reflect the global loss geometry) and \emph{momentum-echo oscillation} (late-stage instability caused by accumulated momentum). To our knowledge, these failure modes have not been jointly articulated and addressed in the FL literature. We propose \textbf{FedWMSAM} to address both failure modes. First, we construct a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security