Efficient Distributed Optimization under Heavy-Tailed Noise

Su Hyeong Lee; Manzil Zaheer; Tian Li

arXiv:2502.04164·cs.LG·August 15, 2025

Efficient Distributed Optimization under Heavy-Tailed Noise

Su Hyeong Lee, Manzil Zaheer, Tian Li

PDF

Open Access 1 Video

TL;DR

This paper introduces TailOPT, a novel distributed optimization framework that effectively handles heavy-tailed stochastic gradient noise, improving training efficiency and performance in large-scale machine learning models.

Contribution

It proposes TailOPT with adaptive clipping techniques, providing convergence guarantees under heavy-tailed noise and introducing a memory-efficient variant, $Bi^2Clip.

Findings

01

TailOPT outperforms existing methods on language tasks.

02

$Bi^2Clip achieves adaptive-like performance without extra gradient statistics.

03

The framework guarantees convergence under unbounded gradient variance.

Abstract

Distributed optimization has become the default training paradigm in modern machine learning due to the growing scale of models and datasets. To mitigate communication overhead, local updates are often applied before global aggregation, resulting in a nested optimization approach with inner and outer steps. However, heavy-tailed stochastic gradient noise remains a significant challenge, particularly in attention-based models, hindering effective training. In this work, we propose TailOPT, an efficient framework designed to address heavy-tailed noise by leveraging adaptive optimization or clipping techniques. We establish convergence guarantees for the TailOPT framework under heavy-tailed noise with potentially unbounded gradient variance and local updates. Among its variants, we highlight a memory and communication efficient instantiation which we call $B i^{2} C l i p$ , which performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Efficient Distributed Optimization under Heavy-Tailed Noise· slideslive

Taxonomy

TopicsNeural Networks and Applications · Energy Efficient Wireless Sensor Networks · Distributed Sensor Networks and Detection Algorithms