Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

Deepesh Data; Suhas Diggavi

arXiv:2005.07866·stat.ML·May 19, 2020

Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

Deepesh Data, Suhas Diggavi

PDF

TL;DR

This paper develops a Byzantine-resilient stochastic gradient descent algorithm for high-dimensional, heterogeneous data in distributed settings, achieving robustness against up to 25% malicious workers while maintaining convergence rates similar to standard SGD.

Contribution

It introduces a new robust mean estimation method adapted for stochastic, heterogeneous data and provides convergence analysis with concrete bounds, including a gradient compression variant.

Findings

01

Algorithm tolerates up to 25% Byzantine workers.

02

Achieves convergence rates comparable to vanilla SGD.

03

Gradient compression reduces communication without affecting convergence.

Abstract

We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply their filtering procedure in our {\em heterogeneous} data setting where workers compute {\em stochastic} gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth strongly-convex and non-convex objectives. We derive our results under the bounded variance assumption on local stochastic gradients and a {\em…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent