Asynchronous Byzantine Machine Learning (the case of SGD)
Georgios Damaskinos, El Mahdi El Mhamdi, Rachid Guerraoui, Rhicheek, Patra, Mahsa Taziki

TL;DR
This paper introduces Kardam, a novel asynchronous SGD algorithm resilient to Byzantine workers, ensuring convergence despite failures or malicious behavior, and demonstrates its effectiveness through theoretical analysis and empirical evaluation.
Contribution
Kardam is the first asynchronous SGD algorithm that combines filtering and dampening to handle Byzantine workers, guaranteeing convergence and improving robustness.
Findings
Kardam guarantees almost sure convergence under Byzantine and asynchronous conditions.
The algorithm's slowdown due to Byzantine resilience is less than f/n, where f is Byzantine workers and n is total workers.
Empirical results show Kardam does not add noise and outperforms other staleness-aware asynchronous methods.
Abstract
Asynchronous distributed machine learning solutions have proven very effective so far, but always assuming perfectly functioning workers. In practice, some of the workers can however exhibit Byzantine behavior, caused by hardware failures, software bugs, corrupt data, or even malicious attacks. We introduce \emph{Kardam}, the first distributed asynchronous stochastic gradient descent (SGD) algorithm that copes with Byzantine workers. Kardam consists of two complementary components: a filtering and a dampening component. The first is scalar-based and ensures resilience against Byzantine workers. Essentially, this filter leverages the Lipschitzness of cost functions and acts as a self-stabilizer against Byzantine workers that would attempt to corrupt the progress of SGD. The dampening component bounds the convergence rate by adjusting to stale information through a generic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Random Matrices and Applications
MethodsStochastic Gradient Descent
