Stochastic Zeroth-Order Optimization Under Heavy-Tailed Noise
Taha El Bakkali, El Mahdi Chayti, Qiuyi Zhang, Imane Rahali, Omar Saadi

TL;DR
This paper introduces RSC-ZO, a robust zeroth-order optimization method that achieves high-probability guarantees under heavy-tailed noise, matching first-order rates and extending classical bounds.
Contribution
The paper proposes RSC-ZO, a novel scalar-clipped zeroth-order method that handles heavy-tailed noise with high-probability guarantees, a significant advancement over classical ZO theory.
Findings
RSC-ZO finds an ε-stationary point with high probability using rac{d^{p/(2(p-1))}}{ ext{evaluations}}.
At p=2, the method achieves rac{d \u2212 4}{ ext{evaluations}}, matching classical bounds.
The analysis includes a momentum variant and explores batch-size and stepsize tradeoffs.
Abstract
We study stochastic zeroth-order (ZO) optimization of smooth nonconvex objectives under heavy-tailed sample-gradient noise. This regime is motivated by empirical evidence that gradient noise in modern machine learning can violate the bounded-variance assumptions used in classical ZO theory. While first-order methods have optimal rates under bounded -th moment noise for , analogous high-probability guarantees for nonconvex ZO methods are much less understood. The ZO setting is not a direct corollary of first-order theory. First-order methods observe stochastic gradients, whereas derivative-free methods only query noisy function values and build finite-difference estimates. Thus, weak- control of must first be transferred to scalar directional estimates. We propose the Robust Scalar-Clipped Zeroth-Order method (RSC-ZO), a two-point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
