Refining Adaptive Zeroth-Order Optimization at Ease
Yao Shu, Qixin Zhang, Kun He, Zhongxiang Dai

TL;DR
This paper introduces R-AdaZO, a novel adaptive zeroth-order optimization method that leverages variance reduction techniques to improve convergence speed and stability in black-box and resource-constrained scenarios.
Contribution
It provides the first variance reduction analysis for first moment estimates in ZO optimization and develops a variance-aware convergence framework for adaptive ZO methods.
Findings
R-AdaZO achieves faster convergence than ZO-AdaMM.
Theoretical analysis confirms variance reduction benefits.
Experiments demonstrate improved performance in black-box attacks and LLM fine-tuning.
Abstract
Recently, zeroth-order (ZO) optimization plays an essential role in scenarios where gradient information is inaccessible or unaffordable, such as black-box systems and resource-constrained environments. While existing adaptive methods such as ZO-AdaMM have shown promise, they are fundamentally limited by their underutilization of moment information during optimization, usually resulting in underperforming convergence. To overcome these limitations, this paper introduces Refined Adaptive Zeroth-Order Optimization (R-AdaZO). Specifically, we first show the untapped variance reduction effect of first moment estimate on ZO gradient estimation, which improves the accuracy and stability of ZO updates. We then refine the second moment estimate based on these variance-reduced gradient estimates to better capture the geometry of the optimization landscape, enabling a more effective scaling of ZO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Methods for Nonlinear Equations
