SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity
Zhenghao Gan, Yichen Bao, Yifei Liu, Chen Chen, Quan Chen, Minyi Guo

TL;DR
SageSched is a novel LLM inference scheduler that effectively manages demand uncertainty and hybridity, significantly improving efficiency by combining prompt analysis, cost modeling, and uncertainty-aware policies.
Contribution
This work introduces SageSched, a new LLM scheduler that accurately predicts output lengths and models compute-memory costs to optimize scheduling under demand uncertainty.
Findings
Achieves over 28.7% efficiency improvement in diverse setups.
Effectively handles demand uncertainty and hybridity in LLM inference.
Outperforms existing heuristics and resource-focused schedulers.
Abstract
Efficient LLM inference scheduling is crucial for user experience. However, LLM inferences exhibit remarkable demand uncertainty (with unknown output length beforehand) and hybridity (being both compute and memory intensive). Existing LLM schedulers rely on simple heuristics or focus purely on compute resource, suffering suboptimal performance. In this work, we propose SageSched, an efficient LLM scheduler that properly handles demand uncertainty and hybridity of inference workloads. SageSched combines prompt contents with the past inference results to predict output-length distribution in a light-weight and also accurate manner. Meanwhile, it models the true service cost of an inference request with both compute and memory aspects considered. Finally, SageSched employs an uncertainty-aware scheduling policy that can yield the best overall efficiency given the request cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
