Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence

Shuhua Yu; Dusan Jakovetic; Soummya Kar

arXiv:2505.03736·math.OC·April 17, 2026

Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence

Shuhua Yu, Dusan Jakovetic, Soummya Kar

PDF

1 Video

TL;DR

This paper introduces GT-NSGDm, a normalization-based decentralized algorithm for nonconvex optimization under heavy-tailed noise, achieving optimal convergence rates matching centralized lower bounds.

Contribution

It proposes a novel normalization and gradient tracking method that guarantees optimal convergence rates in decentralized heavy-tailed noise settings, a first in the literature.

Findings

01

GT-NSGDm achieves the optimal non-asymptotic convergence rate of O(1/T^{(p-1)/(3p-2)})

02

The method is topology-independent when the tail index p is unknown, with a convergence rate of O(1/T^{(p-1)/(2p)})

03

Experiments show GT-NSGDm outperforms baselines in robustness and efficiency on real-world tasks.

Abstract

Heavy-tailed noise in nonconvex stochastic optimization has garnered increasing research interest, as empirical studies, including those on training attention models, suggest it is a more realistic gradient noise condition. This paper studies first-order nonconvex stochastic optimization under heavy-tailed gradient noise in a decentralized setup, where each node can only communicate with its direct neighbors in a predefined graph. Specifically, we consider a class of heavy-tailed gradient noise that is zero-mean and has only $p$ -th moment for $p \in (1, 2]$ . We propose GT-NSGDm, Gradient Tracking based Normalized Stochastic Gradient Descent with momentum, that utilizes normalization, in conjunction with gradient tracking and momentum, to cope with heavy-tailed noise on distributed nodes. We show that, when the communication graph admits primitive and doubly stochastic weights, GT-NSGDm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence· slideslive