Federated Stochastic Minimax Optimization under Heavy-Tailed Noises

Xinwen Zhang; Hongchang Gao

arXiv:2511.04456·cs.LG·November 7, 2025

Federated Stochastic Minimax Optimization under Heavy-Tailed Noises

Xinwen Zhang, Hongchang Gao

PDF

Open Access

TL;DR

This paper introduces two federated minimax optimization algorithms designed to handle heavy-tailed gradient noise, providing the first theoretical convergence guarantees in this challenging setting, validated by extensive experiments.

Contribution

Proposes two novel algorithms, Fed-NSGDA-M and FedMuon-DA, with theoretical convergence guarantees for federated minimax optimization under heavy-tailed noise.

Findings

01

Achieve convergence rate of O(1/(TNp)^{(s-1)/(2s)})

02

First algorithms with rigorous guarantees under heavy-tailed noise in federated minimax

03

Experimental results confirm effectiveness of the proposed methods.

Abstract

Heavy-tailed noise has attracted growing attention in nonconvex stochastic optimization, as numerous empirical studies suggest it offers a more realistic assumption than standard bounded variance assumption. In this work, we investigate nonconvex-PL minimax optimization under heavy-tailed gradient noise in federated learning. We propose two novel algorithms: Fed-NSGDA-M, which integrates normalized gradients, and FedMuon-DA, which leverages the Muon optimizer for local updates. Both algorithms are designed to effectively address heavy-tailed noise in federated minimax optimization, under a milder condition. We theoretically establish that both algorithms achieve a convergence rate of $O (1 / (T N p)^{\frac{s - 1}{2 s}})$ . To the best of our knowledge, these are the first federated minimax optimization algorithms with rigorous theoretical guarantees under heavy-tailed noise. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques