Zeroth-Order Optimization is Secretly Single-Step Policy Optimization
Junbin Qiu, Zhengpeng Xie, Xiangda Yan, Yongjie Yang, Yao Shu

TL;DR
This paper reveals that Zeroth-Order Optimization methods are fundamentally equivalent to single-step Policy Optimization, providing new insights and practical variance reduction techniques that improve convergence and performance.
Contribution
It establishes a formal connection between ZOO and single-step PO, and introduces ZoAR, a novel ZOO algorithm with PO-inspired variance reduction methods.
Findings
ZOO with finite differences is equivalent to single-step PO.
REINFORCE gradient estimators are mathematically equivalent to ZOO estimators.
ZoAR outperforms existing methods in convergence speed and final performance.
Abstract
Zeroth-Order Optimization (ZOO) provides powerful tools for optimizing functions where explicit gradients are unavailable or expensive to compute. However, the underlying mechanisms of popular ZOO methods, particularly those employing randomized finite differences, and their connection to other optimization paradigms like Reinforcement Learning (RL) are not fully elucidated. This paper establishes a fundamental and previously unrecognized connection: ZOO with finite differences is equivalent to a specific instance of single-step Policy Optimization (PO). We formally unveil that the implicitly smoothed objective function optimized by common ZOO algorithms is identical to a single-step PO objective. Furthermore, we show that widely used ZOO gradient estimators, are mathematically equivalent to the REINFORCE gradient estimator with a specific baseline function, revealing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture
MethodsREINFORCE · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Parrot optimizer: Algorithm and applications to medical problems
