Toward Efficient Online Scheduling for Distributed Machine Learning Systems
Menglu Yu, Jia Liu, Chuan Wu, Bo Ji, Elizabeth S. Bentley

TL;DR
This paper proposes an online scheduling algorithm for distributed machine learning systems that optimizes resource allocation and locality to reduce training time, supported by a new analytical model and approximation algorithm.
Contribution
It introduces a novel analytical model and an approximation algorithm for efficient online scheduling in distributed ML systems, considering resource and locality optimization.
Findings
Developed a new analytical model for resource and locality considerations.
Transformed the scheduling problem into a mixed packing and covering integer program.
Designed and analyzed a randomized rounding approximation algorithm.
Abstract
Recent years have witnessed a rapid growth of distributed machine learning (ML) frameworks, which exploit the massive parallelism of computing clusters to expedite ML training. However, the proliferation of distributed ML frameworks also introduces many unique technical challenges in computing system design and optimization. In a networked computing cluster that supports a large number of training jobs, a key question is how to design efficient scheduling algorithms to allocate workers and parameter servers across different machines to minimize the overall training time. Toward this end, in this paper, we develop an online scheduling algorithm that jointly optimizes resource allocation and locality decisions. Our main contributions are three-fold: i) We develop a new analytical model that considers both resource allocation and locality; ii) Based on an equivalent reformulation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Optimization and Search Problems · Stochastic Gradient Optimization Techniques
