From Inexact Gradients to Byzantine Robustness: Acceleration and Optimization under Similarity
Renaud Gaucher, Aymeric Dieuleveut, Hadrien Hendrikx

TL;DR
This paper models Byzantine-robust federated learning as an inexact gradient optimization problem, introduces accelerated algorithms, and demonstrates reduced communication complexity both theoretically and empirically.
Contribution
It formulates Byzantine robustness as an inexact gradient optimization problem and proposes accelerated algorithms leveraging this framework, improving convergence speed and communication efficiency.
Findings
GD with robust aggregation achieves optimal asymptotic error.
Proposed accelerated schemes significantly reduce communication complexity.
Theoretical and empirical results confirm faster convergence and efficiency.
Abstract
Standard federated learning algorithms are vulnerable to adversarial nodes, a.k.a. Byzantine failures. To solve this issue, robust distributed learning algorithms have been developed, which typically replace parameter averaging by robust aggregations. While generic conditions on these aggregations exist to guarantee the convergence of (Stochastic) Gradient Descent (SGD), the analyses remain rather ad-hoc. This hinders the development of more complex robust algorithms, such as accelerated ones. In this work, we show that Byzantine-robust distributed optimization can, under standard generic assumptions, be cast as a general optimization with inexact gradient oracles (with both additive and multiplicative error terms), an active field of research. This allows for instance to directly show that GD on top of standard robust aggregation procedures obtains optimal asymptotic error in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms
