Distributed Deep Learning Using Volunteer Computing-Like Paradigm
Medha Atre, Birendra Jha, Ashwini Rao

TL;DR
This paper introduces VC-ASGD, a novel asynchronous stochastic gradient descent method enabling cost-effective distributed deep learning on volunteer-like systems, achieving significant cost reductions and enhanced security.
Contribution
The paper presents a new distributed deep learning framework using volunteer computing principles, specifically designing VC-ASGD for fault-tolerant, low-cost training on commercial cloud preemptible instances.
Findings
Lowered training costs by 70-90% using preemptible instances.
Demonstrated fault tolerance and security improvements in distributed DL training.
Achieved efficient data parallel training on heterogeneous volunteer-like systems.
Abstract
Use of Deep Learning (DL) in commercial applications such as image classification, sentiment analysis and speech recognition is increasing. When training DL models with large number of parameters and/or large datasets, cost and speed of training can become prohibitive. Distributed DL training solutions that split a training job into subtasks and execute them over multiple nodes can decrease training time. However, the cost of current solutions, built predominantly for cluster computing systems, can still be an issue. In contrast to cluster computing systems, Volunteer Computing (VC) systems can lower the cost of computing, but applications running on VC systems have to handle fault tolerance, variable network latency and heterogeneity of compute nodes, and the current solutions are not designed to do so. We design a distributed solution that can run DL training on a VC system by using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
