Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness
Nikita Kornilov, Philip Zmushko, Andrei Semenov, Mark Ikonnikov, Alexander Gasnikov, Alexander Beznosikov

TL;DR
This paper introduces sign-based methods for non-convex optimization under heavy-tailed noise and generalized smoothness, providing the first high probability convergence bounds and demonstrating superior empirical performance in training large language models.
Contribution
It offers the first high probability convergence bounds for sign-based methods under $(L_0, L_1)$-smoothness and heavy-tailed noise, with novel results for standard smoothness and practical robustness.
Findings
SignSGD with batching achieves specific sample complexity bounds.
SignSGD with Majority Voting is robust across a range of noise tail indices.
Sign-based methods outperform clipping and normalization in training large language models.
Abstract
In recent years, non-convex optimization problems are more often described by generalized -smoothness assumption rather than standard one. Meanwhile, severely corrupted data used in these problems has increased the demand for methods capable of handling heavy-tailed noises, i.e., noises with bounded -th moment. Motivated by these real-world trends and challenges, we explore sign-based methods in this setup and demonstrate their effectiveness in comparison with other popular solutions like clipping or normalization. In theory, we prove the first-known high probability convergence bounds under -smoothness and heavy-tailed noises with mild parameter dependencies. In the case of standard smoothness, these bounds are novel for sign-based methods as well. In particular, SignSGD with batching achieves sample complexity $\tilde{O}\left(\left(\frac{\Delta…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Risk and Portfolio Optimization · Distributed Sensor Networks and Detection Algorithms
