Zeroth-Order Optimization is Secretly Single-Step Policy Optimization

Junbin Qiu; Zhengpeng Xie; Xiangda Yan; Yongjie Yang; Yao Shu

arXiv:2506.14460·cs.LG·June 18, 2025

Zeroth-Order Optimization is Secretly Single-Step Policy Optimization

Junbin Qiu, Zhengpeng Xie, Xiangda Yan, Yongjie Yang, Yao Shu

PDF

Open Access

TL;DR

This paper reveals that Zeroth-Order Optimization methods are fundamentally equivalent to single-step Policy Optimization, providing new insights and practical variance reduction techniques that improve convergence and performance.

Contribution

It establishes a formal connection between ZOO and single-step PO, and introduces ZoAR, a novel ZOO algorithm with PO-inspired variance reduction methods.

Findings

01

ZOO with finite differences is equivalent to single-step PO.

02

REINFORCE gradient estimators are mathematically equivalent to ZOO estimators.

03

ZoAR outperforms existing methods in convergence speed and final performance.

Abstract

Zeroth-Order Optimization (ZOO) provides powerful tools for optimizing functions where explicit gradients are unavailable or expensive to compute. However, the underlying mechanisms of popular ZOO methods, particularly those employing randomized finite differences, and their connection to other optimization paradigms like Reinforcement Learning (RL) are not fully elucidated. This paper establishes a fundamental and previously unrecognized connection: ZOO with finite differences is equivalent to a specific instance of single-step Policy Optimization (PO). We formally unveil that the implicitly smoothed objective function optimized by common ZOO algorithms is identical to a single-step PO objective. Furthermore, we show that widely used ZOO gradient estimators, are mathematically equivalent to the REINFORCE gradient estimator with a specific baseline function, revealing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture

MethodsREINFORCE · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Parrot optimizer: Algorithm and applications to medical problems