Hierarchical Zero-Order Optimization for Deep Neural Networks

Sansheng Cao; Zhengyu Ma; Yonghong Tian

arXiv:2602.10607·cs.LG·February 12, 2026

Hierarchical Zero-Order Optimization for Deep Neural Networks

Sansheng Cao, Zhengyu Ma, Yonghong Tian

PDF

Open Access

TL;DR

This paper introduces Hierarchical Zeroth-Order (HZO) optimization, a new divide-and-conquer approach that significantly reduces query complexity and maintains stability, enabling deep neural network training without gradients.

Contribution

HZO is a novel hierarchical approach that decomposes network depth, reducing query complexity from quadratic to logarithmic scale, and demonstrates competitive accuracy with backpropagation.

Findings

01

Reduces query complexity from O(ML^2) to O(ML log L)

02

Maintains numerical stability near the unitary limit

03

Achieves competitive accuracy on CIFAR-10 and ImageNet

Abstract

Zeroth-order (ZO) optimization has long been favored for its biological plausibility and its capacity to handle non-differentiable objectives, yet its computational complexity has historically limited its application in deep neural networks. Challenging the conventional paradigm that gradients propagate layer-by-layer, we propose Hierarchical Zeroth-Order (HZO) optimization, a novel divide-and-conquer strategy that decomposes the depth dimension of the network. We prove that HZO reduces the query complexity from $O (M L^{2})$ to $O (M L lo g L)$ for a network of width $M$ and depth $L$ , representing a significant leap over existing ZO methodologies. Furthermore, we provide a detailed error analysis showing that HZO maintains numerical stability by operating near the unitary limit ( $L_{l i p} \approx 1$ ). Extensive evaluations on CIFAR-10 and ImageNet demonstrate that HZO achieves competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Reservoir Computing · Metaheuristic Optimization Algorithms Research