DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

Lingjiao Chen; Hongyi Wang; Zachary Charles; Dimitris; Papailiopoulos

arXiv:1803.09877·stat.ML·June 25, 2018·31 cites

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris, Papailiopoulos

PDF

Open Access 1 Repo

TL;DR

DRACO is a scalable, coding-theory-based framework for robust distributed training that effectively mitigates malicious node updates while maintaining model accuracy and significantly improving speed over median-based methods.

Contribution

DRACO introduces a novel coding-theory approach for Byzantine resilience in distributed training, providing robustness guarantees without sacrificing efficiency.

Findings

01

DRACO is several times faster than median-based robust training methods.

02

It maintains model accuracy comparable to adversary-free training.

03

DRACO offers problem-independent robustness guarantees.

Abstract

Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur a prohibitive computational overhead in large-scale settings, and their convergence guarantees often require strong assumptions. In this work, we present DRACO, a scalable framework for robust distributed training that uses ideas from coding theory. In DRACO, each compute node evaluates redundant gradients that are used by the parameter server to eliminate the effects of adversarial updates. DRACO comes with problem-independent robustness guarantees, and the model that it trains is identical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hwang595/Draco
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Anomaly Detection Techniques and Applications