Accelerating Byzantine-Robust Distributed Learning with Compressed Communication via Double Momentum and Variance Reduction

Yanghao Li; Changxin Liu; Yuhao Yi

arXiv:2603.15144·cs.LG·April 7, 2026

Accelerating Byzantine-Robust Distributed Learning with Compressed Communication via Double Momentum and Variance Reduction

Yanghao Li, Changxin Liu, Yuhao Yi

PDF

TL;DR

This paper introduces Byz-DM21, a communication-efficient, Byzantine-robust distributed learning algorithm with a novel double-momentum gradient estimator, achieving faster convergence and reduced variance.

Contribution

It proposes a new gradient estimator with double-momentum for Byzantine robustness and communication efficiency, along with a variance reduction variant for improved convergence.

Findings

01

Byz-DM21 converges in $ ilde{O}(rac{1}{ ext{epsilon}^4})$ iterations.

02

Byz-VR-DM21 achieves $ ilde{O}(rac{1}{ ext{epsilon}^3})$ convergence with variance reduction.

03

Numerical experiments confirm the effectiveness of the proposed algorithms.

Abstract

In collaborative and distributed learning, Byzantine robustness reflects a major facet of optimization algorithms. Such distributed algorithms are often accompanied by transmitting a large number of parameters, so communication compression is essential for an effective solution. In this paper, we propose Byz-DM21, a novel Byzantine-robust and communication-efficient stochastic distributed learning algorithm. Our key innovation is a novel gradient estimator based on a double-momentum mechanism, integrating recent advancements in error feedback techniques. Using this estimator, we design both standard and accelerated algorithms that eliminate the need for large batch sizes while maintaining robustness against Byzantine workers. We prove that the Byz-DM21 algorithm has a smaller neighborhood size and converges to $ε$ -stationary points in $O (ε^{- 4})$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.