Learning from History for Byzantine Robust Optimization
Sai Praneeth Karimireddy, Lie He, Martin Jaggi

TL;DR
This paper identifies flaws in existing Byzantine robust algorithms for distributed learning, demonstrates their limitations, and proposes simple, provably robust methods including iterative clipping and momentum to ensure convergence.
Contribution
It introduces the first provably robust stochastic optimization method addressing flaws in current algorithms through novel clipping and momentum techniques.
Findings
Existing algorithms fail to converge without attackers.
Attackers can couple their attacks over time to cause divergence.
Proposed methods ensure convergence under Byzantine attacks.
Abstract
Byzantine robustness has received significant attention recently given its importance for distributed and federated learning. In spite of this, we identify severe flaws in existing algorithms even when the data across the participants is identically distributed. First, we show realistic examples where current state of the art robust aggregation rules fail to converge even in the absence of any Byzantine attackers. Secondly, we prove that even if the aggregation rules may succeed in limiting the influence of the attackers in a single round, the attackers can couple their attacks across time eventually leading to divergence. To address these issues, we present two surprisingly simple strategies: a new robust iterative clipping procedure, and incorporating worker momentum to overcome time-coupled attacks. This is the first provably robust method for the standard stochastic optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
