Zeroth-Order Optimization at the Edge of Stability
Minhak Song, Liang Zhang, Bingcong Li, Niao He, Michael Muehlebach, Sewoong Oh

TL;DR
This paper analyzes the stability of zeroth-order optimization methods in deep learning, revealing their dependence on the entire Hessian spectrum and their operation at the edge of stability.
Contribution
It provides an explicit step size condition for ZO methods' stability, contrasting with FO methods, and derives practical bounds based on Hessian properties.
Findings
ZO methods operate at the edge of stability across various deep learning tasks.
Stability of ZO methods depends on the entire Hessian spectrum, unlike FO methods.
Large step sizes in ZO methods primarily regularize the Hessian trace.
Abstract
Zeroth-order (ZO) methods are widely used when gradients are unavailable or prohibitively expensive, including black-box learning and memory-efficient fine-tuning of large models, yet their optimization dynamics in deep learning remain underexplored. In this work, we provide an explicit step size condition that exactly captures the (mean-square) linear stability of a family of ZO methods based on the standard two-point estimator. Our characterization reveals a sharp contrast with first-order (FO) methods: whereas FO stability is governed solely by the largest Hessian eigenvalue, mean-square stability of ZO methods depends on the entire Hessian spectrum. Since computing the full Hessian spectrum is infeasible in practical neural network training, we further derive tractable stability bounds that depend only on the largest eigenvalue and the Hessian trace. Empirically, we find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
