GADGET: Online Resource Optimization for Scheduling Ring-All-Reduce Learning Jobs
Menglu Yu, Ye Tian, Bo Ji, Chuan Wu, Hridesh Rajan, Jia Liu

TL;DR
This paper introduces GADGET, a novel resource scheduling algorithm for ring-all-reduce deep learning jobs, backed by a new analytical model and extensive experiments showing its effectiveness and advantages over existing methods.
Contribution
The paper presents a new analytical model for scheduling ring-all-reduce deep learning jobs and introduces GADGET, a provably effective resource scheduling algorithm.
Findings
GADGET outperforms existing scheduling methods in experiments.
The analytical model effectively captures diverse DDL performance objectives.
GADGET provides strong theoretical performance guarantees.
Abstract
Fueled by advances in distributed deep learning (DDL), recent years have witnessed a rapidly growing demand for resource-intensive distributed/parallel computing to process DDL computing jobs. To resolve network communication bottleneck and load balancing issues in distributed computing, the so-called ``ring-all-reduce'' decentralized architecture has been increasingly adopted to remove the need for dedicated parameter servers. To date, however, there remains a lack of theoretical understanding on how to design resource optimization algorithms for efficiently scheduling ring-all-reduce DDL jobs in computing clusters. This motivates us to fill this gap by proposing a series of new resource scheduling designs for ring-all-reduce DDL jobs. Our contributions in this paper are three-fold: i) We propose a new resource scheduling analytical model for ring-all-reduce deep learning, which covers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Cloud Computing and Resource Management · Stochastic Gradient Optimization Techniques
