Distributed Zeroth-Order Optimization with Rademacher Perturbations and Momentum Gradient Tracking
Yanxu Su, Xiaorui Tong, Changyin Sun

TL;DR
This paper introduces ZO-MGT, a novel distributed zeroth-order optimization method that combines momentum variance reduction and dynamic gradient tracking, achieving efficient convergence with minimal function queries.
Contribution
It proposes ZO-MGT, which uses Rademacher perturbations and momentum to reduce variance and bias, requiring only two function queries per iteration for distributed non-convex optimization.
Findings
Achieves an $ ext{O}(1/T)$ convergence rate.
Large momentum suppresses heterogeneity bias quadratically.
Numerical results show improved convergence under data heterogeneity.
Abstract
Zeroth-order (ZO) optimization is indispensable for complex non-convex tasks where explicit gradients are computationally prohibitive or strictly inaccessible. For deploying ZO methods over distributed heterogeneous networks, the gradient tracking technique is often employed to eliminate structural data biases. However, the inherent variance of derivative-free estimators is also amplified. To overcome this problem, we propose Zeroth-Order Momentum Gradient Tracking (ZO-MGT), which integrates momentum-based variance reduction with dynamic gradient tracking. Specifically, ZO-MGT that requires exactly two function queries per iteration can avoid costly batch sampling and prevent variance explosion, while eliminating structural biases. Moreover, by utilizing Rademacher perturbations, it preserves optimal query efficiency and enables bitwise hardware acceleration. We theoretically analyze…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
