Byzantine-Tolerant Machine Learning

Peva Blanchard; El Mahdi El Mhamdi; Rachid Guerraoui; Julien Stainer

arXiv:1703.02757·cs.DC·March 14, 2017·23 cites

Byzantine-Tolerant Machine Learning

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, Julien Stainer

PDF

Open Access

TL;DR

This paper investigates the robustness of distributed stochastic gradient descent (SGD) algorithms against Byzantine failures, introduces a resilience property, and proposes Krum, an update rule that tolerates Byzantine workers without limiting problem dimensions.

Contribution

It establishes the limitations of linear combination-based updates and introduces Krum, a novel Byzantine-resilient update rule for distributed SGD.

Findings

01

Linear combination-based updates cannot tolerate even a single Byzantine failure.

02

Krum satisfies the resilience property ensuring convergence with Byzantine failures.

03

Krum's time complexity is O(n^2 (d + log n)) for d-dimensional problems.

Abstract

The growth of data, the need for scalability and the complexity of models used in modern machine learning calls for distributed implementations. Yet, as of today, distributed machine learning frameworks have largely ignored the possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study the robustness to Byzantine failures at the fundamental level of stochastic gradient descent (SGD), the heart of most machine learning algorithms. Assuming a set of $n$ workers, up to $f$ of them being Byzantine, we ask how robust can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient descent update rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the update rule capturing the basic requirements to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security