ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs
Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

TL;DR
This paper introduces ESG, a novel GPU-aware scheduling algorithm for serverless DNN workflows that significantly improves performance and cost-efficiency by effectively managing GPU sharing and task batching.
Contribution
ESG is the first scheduler to incorporate GPU sharing as a core factor, using A*-search and dual-blade pruning for efficient, scalable scheduling in serverless environments.
Findings
ESG improves SLO hit rates by 61%-80%.
ESG reduces costs by 47%-187%.
ESG effectively handles GPU sharing and task batching.
Abstract
Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Brain Tumor Detection and Classification · Cloud Computing and Resource Management
