Byzantine Stochastic Gradient Descent

Dan Alistarh; Zeyuan Allen-Zhu; Jerry Li

arXiv:1803.08917·cs.LG·March 26, 2018·105 cites

Byzantine Stochastic Gradient Descent

Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li

PDF

Open Access

TL;DR

This paper introduces a Byzantine-resilient stochastic gradient descent method for distributed optimization, capable of tolerating adversarial machine failures and achieving near-optimal convergence rates.

Contribution

It proposes a new Byzantine-tolerant SGD algorithm with proven optimality in both sample and time complexity.

Findings

01

Achieves $ ilde{O}(1/( ext{epsilon}^2 m) + ext{alpha}^2/ ext{epsilon}^2)$ convergence iterations.

02

Tolerates an $ ext{alpha}$-fraction of Byzantine machines in distributed setting.

03

Proves lower bounds showing near-optimality of the proposed method.

Abstract

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the $m$ machines which allegedly compute stochastic gradients every iteration, an $α$ -fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds $ε$ -approximate minimizers of convex functions in $T = \tilde{O} (\frac{1}{ε ^{2} m} + \frac{α ^{2}}{ε ^{2}})$ iterations. In contrast, traditional mini-batch SGD needs $T = O (\frac{1}{ε ^{2} m})$ iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent