Byzantine-Tolerant Machine Learning
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, Julien Stainer

TL;DR
This paper investigates the robustness of distributed stochastic gradient descent (SGD) algorithms against Byzantine failures, introduces a resilience property, and proposes Krum, an update rule that tolerates Byzantine workers without limiting problem dimensions.
Contribution
It establishes the limitations of linear combination-based updates and introduces Krum, a novel Byzantine-resilient update rule for distributed SGD.
Findings
Linear combination-based updates cannot tolerate even a single Byzantine failure.
Krum satisfies the resilience property ensuring convergence with Byzantine failures.
Krum's time complexity is O(n^2 (d + log n)) for d-dimensional problems.
Abstract
The growth of data, the need for scalability and the complexity of models used in modern machine learning calls for distributed implementations. Yet, as of today, distributed machine learning frameworks have largely ignored the possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study the robustness to Byzantine failures at the fundamental level of stochastic gradient descent (SGD), the heart of most machine learning algorithms. Assuming a set of workers, up to of them being Byzantine, we ask how robust can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient descent update rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the update rule capturing the basic requirements to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security
