Convergence Analysis of Randomized Subspace Normalized SGD under Heavy-Tailed Noise

Gaku Omiya; Pierre-Louis Poirion; Akiko Takeda

arXiv:2601.20399·math.OC·January 30, 2026

Convergence Analysis of Randomized Subspace Normalized SGD under Heavy-Tailed Noise

Gaku Omiya, Pierre-Louis Poirion, Akiko Takeda

PDF

Open Access

TL;DR

This paper establishes high-probability convergence bounds for randomized subspace SGD under sub-Gaussian noise and introduces a normalized variant that performs better with heavy-tailed gradient noise.

Contribution

The paper provides the first high-probability convergence analysis for RS-SGD under sub-Gaussian noise and proposes RS-NSGD, which improves convergence with heavy-tailed gradients.

Findings

01

RS-SGD achieves high-probability convergence bounds similar to expectation-based results.

02

RS-NSGD attains better oracle complexity than full-dimensional normalized SGD.

03

Theoretical guarantees hold under bounded $p$-th moments of noise.

Abstract

Randomized subspace methods reduce per-iteration cost; however, in nonconvex optimization, most analyses are expectation-based, and high-probability bounds remain scarce even under sub-Gaussian noise. We first prove that randomized subspace SGD (RS-SGD) admits a high-probability convergence bound under sub-Gaussian noise, achieving the same order of oracle complexity as prior in-expectation results. Motivated by the prevalence of heavy-tailed gradients in modern machine learning, we then propose randomized subspace normalized SGD (RS-NSGD), which integrates direction normalization into subspace updates. Assuming the noise has bounded $p$ -th moments, we establish both in-expectation and high-probability convergence guarantees, and show that RS-NSGD can achieve better oracle complexity than full-dimensional normalized SGD.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data