Locality-aware Fair Scheduling in LLM Serving
Shiyi Cao, Yichuan Wang, Ziming Mao, Pin-Lun Hsu, Liangsheng Yin, Tian, Xia, Dacheng Li, Shu Liu, Yineng Zhang, Yang Zhou, Ying Sheng, Joseph, Gonzalez, Ion Stoica

TL;DR
This paper proposes a novel locality-aware fair scheduling algorithm for LLM serving that improves fairness, throughput, and latency by considering prefix locality and balancing multiple objectives.
Contribution
It introduces DLPM and D$^2$LPM algorithms that incorporate prefix locality into fair scheduling, addressing limitations of existing methods.
Findings
DLPM achieves up to 2.87× higher throughput than VTC.
D$^2$LPM reduces per-client latency by up to 7.18×.
Algorithms maintain high prefix locality and fairness in distributed LLM serving.
Abstract
Large language model (LLM) inference workload dominates a wide variety of modern AI applications, ranging from multi-turn conversation to document analysis. Balancing fairness and efficiency is critical for managing diverse client workloads with varying prefix patterns. Unfortunately, existing fair scheduling algorithms for LLM serving, such as Virtual Token Counter (VTC), fail to take prefix locality into consideration and thus suffer from poor performance. On the other hand, locality-aware scheduling algorithms in existing LLM serving frameworks tend to maximize the prefix cache hit rate without considering fair sharing among clients. This paper introduces the first locality-aware fair scheduling algorithm, Deficit Longest Prefix Match (DLPM), which can maintain a high degree of prefix locality with a fairness guarantee. We also introduce a novel algorithm, Double Deficit LPM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Optimization Algorithms · Advanced Manufacturing and Logistics Optimization · Petri Nets in System Modeling
MethodsLocal Prior Matching
