Stochastic Zeroth-Order Optimization Under Heavy-Tailed Noise

Taha El Bakkali; El Mahdi Chayti; Qiuyi Zhang; Imane Rahali; Omar Saadi

arXiv:2605.17394·math.OC·May 19, 2026

Stochastic Zeroth-Order Optimization Under Heavy-Tailed Noise

Taha El Bakkali, El Mahdi Chayti, Qiuyi Zhang, Imane Rahali, Omar Saadi

PDF

TL;DR

This paper introduces RSC-ZO, a robust zeroth-order optimization method that achieves high-probability guarantees under heavy-tailed noise, matching first-order rates and extending classical bounds.

Contribution

The paper proposes RSC-ZO, a novel scalar-clipped zeroth-order method that handles heavy-tailed noise with high-probability guarantees, a significant advancement over classical ZO theory.

Findings

01

RSC-ZO finds an ε-stationary point with high probability using rac{d^{p/(2(p-1))}}{ ext{evaluations}}.

02

At p=2, the method achieves rac{d \u2212 4}{ ext{evaluations}}, matching classical bounds.

03

The analysis includes a momentum variant and explores batch-size and stepsize tradeoffs.

Abstract

We study stochastic zeroth-order (ZO) optimization of smooth nonconvex objectives under heavy-tailed sample-gradient noise. This regime is motivated by empirical evidence that gradient noise in modern machine learning can violate the bounded-variance assumptions used in classical ZO theory. While first-order methods have optimal rates under bounded $p$ -th moment noise for $p \in (1, 2]$ , analogous high-probability guarantees for nonconvex ZO methods are much less understood. The ZO setting is not a direct corollary of first-order theory. First-order methods observe stochastic gradients, whereas derivative-free methods only query noisy function values and build finite-difference estimates. Thus, weak- $L_{p}$ control of $\nabla F (x; ξ) - \nabla f (x)$ must first be transferred to scalar directional estimates. We propose the Robust Scalar-Clipped Zeroth-Order method (RSC-ZO), a two-point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.