EconoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving
Haiying Shen, Tanmoy Sen

TL;DR
EconoServe is a system designed to optimize multi-resource utilization in large language model serving, significantly increasing throughput and SLO satisfaction while reducing GPU usage by intelligently managing GPU compute and Key-Value Cache resources.
Contribution
It introduces a novel scheduling and resource management system that maximizes GPU and KVC utilization with SLO guarantees, outperforming existing schedulers in efficiency and resource savings.
Findings
Up to 4× throughput increase at the same latency
Up to 91% reduction in job completion time
Up to 78% GPU reduction while maintaining goodput
Abstract
As Large Language Models (LLMs) continue to grow, reducing costs and alleviating GPU demands has become increasingly critical. However, existing schedulers primarily target either GPU compute or Key-Value Cache (KVC) utilization, failing to fully optimize both GPU compute and KVC usage during each iteration or guarantee timely KVC allocations when needed. To address these challenges, we conducted a trace-based experimental analysis and made insightful observations, leading to the design of a system called EconoServe. EconoServe maximizes multi-resource utilization while ensuring service-level objective (SLO) guarantees in LLM serving. To enable adding prompts to a batch to maximize GPU utilization in each iteration, EconoServe maintains separate waiting queues for prompt processing tasks (PTs) and generation tasks (GTs). It batches GTs with the same predicted response lengths (RL) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Smart Grid Security and Resilience · Cloud Computing and Resource Management
