Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding
Chengxi Li, Youssef Allouah, Rachid Guerraoui, Mikael Skoglund, and Ming Xiao

TL;DR
This paper introduces LAD, a distributed training method using cyclic gradient coding to improve robustness against Byzantine attacks and reduce communication costs, with proven convergence and effectiveness.
Contribution
The paper proposes a novel cyclic gradient coding-based distributed training method that enhances Byzantine robustness and communication efficiency, addressing limitations of existing approaches.
Findings
LAD improves robustness against Byzantine attacks.
LAD achieves lower solution error in heterogeneous data settings.
Com-LAD reduces communication overhead while maintaining robustness.
Abstract
In this paper, we study the problem of distributed training (DT) under Byzantine attacks with communication constraints. While prior work has developed various robust aggregation rules at the server to enhance robustness to Byzantine attacks, the existing methods suffer from a critical limitation in that the solution error does not diminish when the local gradients sent by different devices vary considerably, as a result of data heterogeneity among the subsets held by different devices. To overcome this limitation, we propose a novel DT method, cyclic gradient coding-based DT (LAD). In LAD, the server allocates the entire training dataset to the devices before training begins. In each iteration, it assigns computational tasks redundantly to the devices using cyclic gradient coding. Each honest device then computes local gradients on a fixed number of data subsets and encodes the local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
