Distributed Deep Learning Using Volunteer Computing-Like Paradigm

Medha Atre; Birendra Jha; Ashwini Rao

arXiv:2103.08894·cs.DC·May 28, 2021

Distributed Deep Learning Using Volunteer Computing-Like Paradigm

Medha Atre, Birendra Jha, Ashwini Rao

PDF

TL;DR

This paper introduces VC-ASGD, a novel asynchronous stochastic gradient descent method enabling cost-effective distributed deep learning on volunteer-like systems, achieving significant cost reductions and enhanced security.

Contribution

The paper presents a new distributed deep learning framework using volunteer computing principles, specifically designing VC-ASGD for fault-tolerant, low-cost training on commercial cloud preemptible instances.

Findings

01

Lowered training costs by 70-90% using preemptible instances.

02

Demonstrated fault tolerance and security improvements in distributed DL training.

03

Achieved efficient data parallel training on heterogeneous volunteer-like systems.

Abstract

Use of Deep Learning (DL) in commercial applications such as image classification, sentiment analysis and speech recognition is increasing. When training DL models with large number of parameters and/or large datasets, cost and speed of training can become prohibitive. Distributed DL training solutions that split a training job into subtasks and execute them over multiple nodes can decrease training time. However, the cost of current solutions, built predominantly for cluster computing systems, can still be an issue. In contrast to cluster computing systems, Volunteer Computing (VC) systems can lower the cost of computing, but applications running on VC systems have to handle fault tolerance, variable network latency and heterogeneity of compute nodes, and the current solutions are not designed to do so. We design a distributed solution that can run DL training on a VC system by using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent