Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

Huaiyi Mu; Yujie Tang; Jie Song; Zhongkui Li

arXiv:2409.19567·math.OC·April 10, 2026

Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

Huaiyi Mu, Yujie Tang, Jie Song, Zhongkui Li

PDF

TL;DR

This paper introduces a variance-reduced gradient estimator for distributed zeroth-order nonconvex optimization, improving convergence and reducing sampling costs.

Contribution

It proposes a novel variance reduction technique combining orthogonal direction renovation and gradient estimation across all dimensions, integrated with gradient tracking.

Findings

01

Oracle complexity is bounded by O(d/ε) for smooth nonconvex functions.

02

Oracle complexity is bounded by O(dκ ln(1/ε)) for gradient dominated nonconvex functions.

03

Numerical simulations demonstrate improved efficiency over existing methods.

Abstract

This paper investigates distributed zeroth-order optimization for smooth nonconvex problems, targeting the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation in current algorithms that use either the $2$ -point or $2 d$ -point gradient estimators. We propose a novel variance-reduced gradient estimator that either randomly renovates a single orthogonal direction of the true gradient or calculates the gradient estimation across all dimensions for variance correction, based on a Bernoulli distribution. Integrating this estimator with gradient tracking mechanism allows us to address the trade-off. We show that the oracle complexity of our proposed algorithm is upper bounded by $O (d / ϵ)$ for smooth nonconvex functions and by $O (d κ ln (1/ ϵ))$ for smooth and gradient dominated nonconvex functions, where $d$ denotes the problem dimension and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.