Zeno: Distributed Stochastic Gradient Descent with Suspicion-based   Fault-tolerance

Cong Xie; Oluwasanmi Koyejo; Indranil Gupta

arXiv:1805.10032·cs.LG·May 21, 2019·83 cites

Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance

Cong Xie, Oluwasanmi Koyejo, Indranil Gupta

PDF

Open Access 1 Repo

TL;DR

Zeno introduces a fault-tolerant distributed SGD method that can handle any number of faulty workers by suspecting and ranking workers, ensuring convergence even with many faults.

Contribution

Zeno extends fault-tolerance in distributed SGD to scenarios with arbitrary faulty workers, using suspicion and ranking mechanisms for robustness.

Findings

01

Zeno outperforms existing fault-tolerance methods in experiments.

02

Proves convergence of SGD with suspicion-based fault detection in non-convex settings.

03

Handles any number of faulty workers, not just a majority.

Abstract

We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xcgoner/icml2019_zeno
mxnetOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications

MethodsStochastic Gradient Descent