Distributed Zeroth-Order Optimization with Rademacher Perturbations and Momentum Gradient Tracking

Yanxu Su; Xiaorui Tong; Changyin Sun

arXiv:2604.21368·math.OC·April 24, 2026

Distributed Zeroth-Order Optimization with Rademacher Perturbations and Momentum Gradient Tracking

Yanxu Su, Xiaorui Tong, Changyin Sun

PDF

TL;DR

This paper introduces ZO-MGT, a novel distributed zeroth-order optimization method that combines momentum variance reduction and dynamic gradient tracking, achieving efficient convergence with minimal function queries.

Contribution

It proposes ZO-MGT, which uses Rademacher perturbations and momentum to reduce variance and bias, requiring only two function queries per iteration for distributed non-convex optimization.

Findings

01

Achieves an $ ext{O}(1/T)$ convergence rate.

02

Large momentum suppresses heterogeneity bias quadratically.

03

Numerical results show improved convergence under data heterogeneity.

Abstract

Zeroth-order (ZO) optimization is indispensable for complex non-convex tasks where explicit gradients are computationally prohibitive or strictly inaccessible. For deploying ZO methods over distributed heterogeneous networks, the gradient tracking technique is often employed to eliminate structural data biases. However, the inherent variance of derivative-free estimators is also amplified. To overcome this problem, we propose Zeroth-Order Momentum Gradient Tracking (ZO-MGT), which integrates momentum-based variance reduction with dynamic gradient tracking. Specifically, ZO-MGT that requires exactly two function queries per iteration can avoid costly batch sampling and prevent variance explosion, while eliminating structural biases. Moreover, by utilizing Rademacher perturbations, it preserves optimal query efficiency and enables bitwise hardware acceleration. We theoretically analyze…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.