Why Does Adaptive Zeroth-Order Optimization Work?

Haishan Ye; Luo Luo

arXiv:2602.01627·math.OC·February 3, 2026

Why Does Adaptive Zeroth-Order Optimization Work?

Haishan Ye, Luo Luo

PDF

Open Access

TL;DR

This paper provides a theoretical explanation for the effectiveness of adaptive zeroth-order optimization methods, showing they are closely related to gradient norms and offering convergence guarantees under generalized smoothness conditions.

Contribution

It introduces a theoretical analysis linking empirical standard deviation to gradient norms and establishes convergence rates for adaptive ZO methods under generalized smoothness.

Findings

01

Empirical standard deviation closely approximates gradient norm with high probability.

02

Adaptive ZO methods achieve faster convergence than fixed-step methods.

03

Explicit query complexity bounds are derived for both deterministic and stochastic cases.

Abstract

Zeroth-order (ZO) optimization is popular in real-world applications that accessing the gradient information is expensive or unavailable. Recently, adaptive ZO methods that normalize gradient estimators by the empirical standard deviation of function values have achieved strong practical performance, particularly in fine-tuning the large language model. However, the theoretical understanding of such strategy remains limited. In this work, we show that the empirical standard deviation is, with high probability, closely proportional to the norm of the (stochastic) gradient. Based on this insight, we analyze adaptive ZO methods under the generalized $(L_{0}, L_{1})$ -smoothness condition with respect to the matrix norm. We establish explicit convergence rates and query complexity bounds for both deterministic and stochastic settings, demonstrating that adaptive ZO methods achieve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research · Advanced Bandit Algorithms Research