A Sum-of-Ratios Multi-Dimensional-Knapsack Decomposition for DNN Resource Scheduling
Menglu Yu, Chuan Wu, Bo Ji, Jia Liu

TL;DR
This paper introduces a novel resource scheduling framework for DNN training that leverages the layered structure of DNN jobs, using a sum-of-ratios multi-dimensional knapsack approach to improve job completion times in large-scale clusters.
Contribution
It presents a new analytical model for DNN resource scheduling, and develops an efficient algorithm based on sum-of-ratios decomposition with strong performance guarantees.
Findings
Significantly reduces DNN job completion times
Outperforms existing scheduling methods in experiments
Provides a scalable and effective scheduling solution
Abstract
In recent years, to sustain the resource-intensive computational needs for training deep neural networks (DNNs), it is widely accepted that exploiting the parallelism in large-scale computing clusters is critical for the efficient deployments of DNN training jobs. However, existing resource schedulers for traditional computing clusters are not well suited for DNN training, which results in unsatisfactory job completion time performance. The limitations of these resource scheduling schemes motivate us to propose a new computing cluster resource scheduling framework that is able to leverage the special layered structure of DNN jobs and significantly improve their job completion times. Our contributions in this paper are three-fold: i) We develop a new resource scheduling analytical model by considering DNN's layered structure, which enables us to analytically formulate the resource…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Cloud Computing and Resource Management
