SHANG++: Robust Stochastic Acceleration under Multiplicative Noise

Yaxin Yu; Long Chen; Minfu Feng

arXiv:2603.09355·math.OC·April 14, 2026

SHANG++: Robust Stochastic Acceleration under Multiplicative Noise

Yaxin Yu, Long Chen, Minfu Feng

PDF

TL;DR

This paper introduces SHANG++, a robust accelerated stochastic gradient method designed to handle multiplicative noise effectively, with proven convergence and strong empirical performance in deep learning tasks.

Contribution

The paper develops SHANG++, a new accelerated stochastic gradient method with enhanced noise robustness and faster convergence, improving stability under multiplicative noise conditions.

Findings

01

SHANG++ achieves faster convergence and better noise robustness than existing methods.

02

In experiments, SHANG++ maintains high accuracy with minimal hyperparameter tuning.

03

SHANG++ performs well on deep learning tasks, including ResNet-34, under noisy conditions.

Abstract

Under the multiplicative noise scaling (MNS) condition, original Nesterov acceleration is provably sensitive to noise and may diverge when gradient noise overwhelms the signal. In this paper, we develop two accelerated stochastic gradient descent methods by discretizing the Hessian-driven Nesterov accelerated gradient flow. We first derive SHANG, a direct Gauss-Seidel-type discretization that already improves stability under MNS. We then introduce SHANG++, which adds a damping correction and achieves faster convergence with stronger noise robustness. We establish convergence guarantees for both convex and strongly convex objectives under MNS, together with explicit parameter choices. In our experiments, SHANG++ performs consistently well across convex problems and applications in deep learning. In a dedicated noise experiment on ResNet-34, a single hyperparameter configuration attains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.