Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates
Ahmad Rammal, Kaja Gruntkowska, Nikita Fedin, Eduard Gorbunov, Peter, Richt\'arik

TL;DR
This paper introduces new Byzantine-robust algorithms with communication compression, achieving improved convergence rates and robustness in large-scale distributed and federated learning scenarios.
Contribution
It proposes two novel algorithms, Byz-DASHA-PAGE and Byz-EF21, with enhanced theoretical guarantees and practical performance for Byzantine-robust learning with compression.
Findings
Byz-DASHA-PAGE has better convergence rates than previous methods.
Byz-EF21 and Byz-EF21-BC demonstrate effective communication compression with error feedback.
Experimental results confirm the theoretical improvements.
Abstract
Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning. These problems are usually huge-scale, implying that communication compression is also imperative for their resolution. These factors have spurred recent algorithmic and theoretical developments in the literature of Byzantine-robust learning with compression. In this paper, we contribute to this research area in two main directions. First, we propose a new Byzantine-robust method with compression - Byz-DASHA-PAGE - and prove that the new method has better convergence rate (for non-convex and Polyak-Lojasiewicz smooth optimization problems), smaller neighborhood size in the heterogeneous case, and tolerates more Byzantine workers under over-parametrization than the previous method with SOTA theoretical convergence guarantees…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Cooperative Communication and Network Coding
