Byzantine Stochastic Gradient Descent
Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li

TL;DR
This paper introduces a Byzantine-resilient stochastic gradient descent method for distributed optimization, capable of tolerating adversarial machine failures and achieving near-optimal convergence rates.
Contribution
It proposes a new Byzantine-tolerant SGD algorithm with proven optimality in both sample and time complexity.
Findings
Achieves $ ilde{O}(1/( ext{epsilon}^2 m) + ext{alpha}^2/ ext{epsilon}^2)$ convergence iterations.
Tolerates an $ ext{alpha}$-fraction of Byzantine machines in distributed setting.
Proves lower bounds showing near-optimality of the proposed method.
Abstract
This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the machines which allegedly compute stochastic gradients every iteration, an -fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds -approximate minimizers of convex functions in iterations. In contrast, traditional mini-batch SGD needs iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
