Hierarchical Zero-Order Optimization for Deep Neural Networks
Sansheng Cao, Zhengyu Ma, Yonghong Tian

TL;DR
This paper introduces Hierarchical Zeroth-Order (HZO) optimization, a new divide-and-conquer approach that significantly reduces query complexity and maintains stability, enabling deep neural network training without gradients.
Contribution
HZO is a novel hierarchical approach that decomposes network depth, reducing query complexity from quadratic to logarithmic scale, and demonstrates competitive accuracy with backpropagation.
Findings
Reduces query complexity from O(ML^2) to O(ML log L)
Maintains numerical stability near the unitary limit
Achieves competitive accuracy on CIFAR-10 and ImageNet
Abstract
Zeroth-order (ZO) optimization has long been favored for its biological plausibility and its capacity to handle non-differentiable objectives, yet its computational complexity has historically limited its application in deep neural networks. Challenging the conventional paradigm that gradients propagate layer-by-layer, we propose Hierarchical Zeroth-Order (HZO) optimization, a novel divide-and-conquer strategy that decomposes the depth dimension of the network. We prove that HZO reduces the query complexity from to for a network of width and depth , representing a significant leap over existing ZO methodologies. Furthermore, we provide a detailed error analysis showing that HZO maintains numerical stability by operating near the unitary limit (). Extensive evaluations on CIFAR-10 and ImageNet demonstrate that HZO achieves competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Reservoir Computing · Metaheuristic Optimization Algorithms Research
