Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered

Sijia Liu; Yicheng Lang; Soumyadeep Pal; Changsheng Wang; Yancheng Huang; Chongyu Fan; James Diffenderfer; Bhavya Kailkhura; Yihua Zhang

arXiv:2605.15622·cs.LG·May 19, 2026

Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered

Sijia Liu, Yicheng Lang, Soumyadeep Pal, Changsheng Wang, Yancheng Huang, Chongyu Fan, James Diffenderfer, Bhavya Kailkhura, Yihua Zhang

PDF

TL;DR

This paper argues that zeroth-order optimization in deep learning is underexplored rather than underpowered, highlighting its potential for scalable, resource-efficient training when rethought beyond traditional methods.

Contribution

It challenges prevailing assumptions about ZO methods' limitations and proposes new perspectives and opportunities for leveraging their unique advantages in deep learning.

Findings

01

ZO optimization can be scaled effectively with variance control techniques.

02

Spectral and subspace views enable interpretable variance reduction.

03

ZO's forward-only nature offers communication and resource efficiency advantages.

Abstract

Zeroth-order (ZO) optimization, learning from finite differences of function evaluations without backpropagation, has recently regained attention in deep learning due to its memory efficiency and applicability to gray- or black-box pipelines. Yet, ZO methods are often dismissed as fundamentally unscalable because of estimator variance and unfavorable query complexity. We argue that this conclusion might be misguided: ZO optimization is underexplored, not underpowered. We show that many perceived limitations stem from myopic development practices, most notably full-space, element-wise, estimator-centric designs. We articulate six positions spanning the algorithmic, systems, and evaluation stack. First, we revisit the feasibility boundaries of estimator-centric ZO methods through variance control, variance-query tradeoffs, and directional-derivative lenses. Then, we identify three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.