SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference

Yinghao Tang; Tingfeng Lan; Xiuqi Huang; Hui Lu; Wei Chen

arXiv:2505.23022·cs.LG·May 30, 2025

SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference

Yinghao Tang, Tingfeng Lan, Xiuqi Huang, Hui Lu, Wei Chen

PDF

Open Access

TL;DR

SCORPIO is an SLO-aware LLM serving system that adaptively manages requests to improve throughput and SLO compliance by exploiting heterogeneity in service objectives.

Contribution

It introduces a novel SLO-oriented scheduling framework with adaptive admission control and batching, specifically designed for heterogeneous SLOs in LLM inference.

Findings

01

System goodput increased by up to 14.4X.

02

SLO adherence improved by up to 46.5%.

03

Effective handling of diverse SLOs in LLM serving environments.

Abstract

Existing Large Language Model (LLM) serving systems prioritize maximum throughput. They often neglect Service Level Objectives (SLOs) such as Time to First Token (TTFT) and Time Per Output Token (TPOT), which leads to suboptimal SLO attainment. This paper introduces SCORPIO, an SLO-oriented LLM serving system designed to maximize system goodput and SLO attainment for workloads with heterogeneous SLOs. Our core insight is to exploit SLO heterogeneity for adaptive scheduling across admission control, queue management, and batch selection. SCORPIO features a TTFT Guard, which employs least-deadline-first reordering and rejects unattainable requests, and a TPOT Guard, which utilizes a VBS-based admission control and a novel credit-based batching mechanism. Both guards are supported by a predictive module. Evaluations demonstrate that SCORPIO improves system goodput by up to 14.4X and SLO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Big Data and Digital Economy

Methodstravel james