Byzantine-Robust Distributed SGD: A Unified Analysis and Tight Error Bounds
Boyuan Ruan, Xiaoyu Wang, Ya-Feng Liu

TL;DR
This paper develops a unified convergence analysis for Byzantine-robust distributed SGD, accounting for data heterogeneity and local momentum, and establishes tight bounds on error and resilience limits.
Contribution
It provides the first comprehensive convergence theory for Byzantine-robust distributed SGD with general data heterogeneity and tight error bounds.
Findings
Local momentum reduces stochasticity-induced error.
Convergence rates are established for nonconvex and Polyak-Lojasiewicz objectives.
Matching lower bounds demonstrate fundamental limits of Byzantine resilience.
Abstract
Byzantine-robust distributed optimization relies on robust aggregation rules to mitigate the influence of malicious Byzantine workers. Despite the proliferation of such rules, a unified convergence analysis framework that accommodates general data heterogeneity is lacking. In this work, we provide a thorough convergence theory of Byzantine-robust distributed stochastic gradient descent (SGD), analyzing variants both with and without local momentum. We establish the convergence rates for nonconvex smooth objectives and those satisfying the Polyak-Lojasiewicz condition under a general data heterogeneity assumption. Our analysis reveals that while stochasticity and data heterogeneity introduce unavoidable error floors, local momentum provably reduces the error component induced by stochasticity. Furthermore, we derive matching lower bounds to demonstrate that the upper bounds obtained in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
