Efficient Byzantine-Resilient Stochastic Gradient Desce

Kaiyun Li; Xiaojun Chen; Ye Dong; Peng Zhang; Dakui Wang; Shuai Zen

arXiv:2108.06658·cs.DC·August 17, 2021

Efficient Byzantine-Resilient Stochastic Gradient Desce

Kaiyun Li, Xiaojun Chen, Ye Dong, Peng Zhang, Dakui Wang, Shuai Zen

PDF

Open Access

TL;DR

This paper introduces BrSGD, a Byzantine-resilient stochastic gradient descent algorithm that achieves optimal statistical performance and efficient computation, effectively handling Byzantine failures in distributed learning.

Contribution

The paper proposes BrSGD, a new algorithm that is provably robust against Byzantine failures and attains optimal convergence rates with low computational complexity.

Findings

01

BrSGD achieves order-optimal statistical error rates for strongly convex functions.

02

BrSGD has computational complexity O(md), with m machines and model dimension d.

03

Experimental results show BrSGD's effectiveness comparable to non-Byzantine methods.

Abstract

Distributed Learning often suffers from Byzantine failures, and there have been a number of works studying the problem of distributed stochastic optimization under Byzantine failures, where only a portion of workers, instead of all the workers in a distributed learning system, compute stochastic gradients at each iteration. These methods, albeit workable under Byzantine failures, have the shortcomings of either a sub-optimal convergence rate or high computation cost. To this end, we propose a new Byzantine-resilient stochastic gradient descent algorithm (BrSGD for short) which is provably robust against Byzantine failures. BrSGD obtains the optimal statistical performance and efficient computation simultaneously. In particular, BrSGD can achieve an order-optimal statistical error rate for strongly convex loss functions. The computation complexity of BrSGD is O(md), where d is the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Wireless Communication Security Techniques