Asynchronous Byzantine Machine Learning (the case of SGD)

Georgios Damaskinos; El Mahdi El Mhamdi; Rachid Guerraoui; Rhicheek; Patra; Mahsa Taziki

arXiv:1802.07928·stat.ML·July 10, 2018·26 cites

Asynchronous Byzantine Machine Learning (the case of SGD)

Georgios Damaskinos, El Mahdi El Mhamdi, Rachid Guerraoui, Rhicheek, Patra, Mahsa Taziki

PDF

Open Access 1 Repo

TL;DR

This paper introduces Kardam, a novel asynchronous SGD algorithm resilient to Byzantine workers, ensuring convergence despite failures or malicious behavior, and demonstrates its effectiveness through theoretical analysis and empirical evaluation.

Contribution

Kardam is the first asynchronous SGD algorithm that combines filtering and dampening to handle Byzantine workers, guaranteeing convergence and improving robustness.

Findings

01

Kardam guarantees almost sure convergence under Byzantine and asynchronous conditions.

02

The algorithm's slowdown due to Byzantine resilience is less than f/n, where f is Byzantine workers and n is total workers.

03

Empirical results show Kardam does not add noise and outperforms other staleness-aware asynchronous methods.

Abstract

Asynchronous distributed machine learning solutions have proven very effective so far, but always assuming perfectly functioning workers. In practice, some of the workers can however exhibit Byzantine behavior, caused by hardware failures, software bugs, corrupt data, or even malicious attacks. We introduce \emph{Kardam}, the first distributed asynchronous stochastic gradient descent (SGD) algorithm that copes with Byzantine workers. Kardam consists of two complementary components: a filtering and a dampening component. The first is scalar-based and ensures resilience against $\frac{1}{3}$ Byzantine workers. Essentially, this filter leverages the Lipschitzness of cost functions and acts as a self-stabilizer against Byzantine workers that would attempt to corrupt the progress of SGD. The dampening component bounds the convergence rate by adjusting to stale information through a generic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LPD-EPFL/kardam
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Random Matrices and Applications

MethodsStochastic Gradient Descent