A Fast and Flat Federated Learning Method via Weighted Momentum and Sharpness-Aware Minimization
Tianle Li, Yongzhi Huang, Linshan Jiang, Chang Liu, Qipeng Xie, Wenfeng Du, Lu Wang, Kaishun Wu

TL;DR
This paper introduces FedWMSAM, a federated learning method that combines weighted momentum and sharpness-aware minimization to improve convergence speed and generalization in non-IID settings, addressing key structural issues.
Contribution
The paper proposes a novel FedWMSAM algorithm that jointly addresses local-global curvature misalignment and momentum-echo oscillation in federated learning, with theoretical convergence guarantees.
Findings
FedWMSAM outperforms existing methods in convergence speed.
The method achieves better generalization on non-IID data.
Experimental results validate robustness and effectiveness.
Abstract
In federated learning (FL), models must \emph{converge quickly} under tight communication budgets while \emph{generalizing} across non-IID client distributions. These twin requirements have naturally led to two widely used techniques: client/server \emph{momentum} to accelerate progress, and \emph{sharpness-aware minimization} (SAM) to prefer flat solutions. However, simply combining momentum and SAM leaves two structural issues unresolved in non-IID FL. We identify and formalize two failure modes: \emph{local-global curvature misalignment} (local SAM directions need not reflect the global loss geometry) and \emph{momentum-echo oscillation} (late-stage instability caused by accumulated momentum). To our knowledge, these failure modes have not been jointly articulated and addressed in the FL literature. We propose \textbf{FedWMSAM} to address both failure modes. First, we construct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security
