EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference
Bronislav Sidik, Chaya Levi, Joseph Kampeas

TL;DR
EWSJF is an adaptive, learning-based scheduler for LLM inference that dynamically partitions workloads and optimizes request prioritization, significantly improving throughput and latency for mixed workloads.
Contribution
The paper introduces EWSJF, a novel adaptive scheduler that learns workload structure in real time to enhance fairness and throughput in LLM serving environments.
Findings
Over 30% increase in end-to-end throughput.
Up to 4x reduction in Time-To-First-Token for short requests.
Effective workload partitioning and prioritization improve performance.
Abstract
Serving Large Language Models (LLMs) under mixed workloads--short, latency-sensitive interactive queries alongside long, throughput-oriented batch requests--poses a fundamental scheduling challenge. Standard First-Come, First-Served (FCFS) policies suffer from severe head-of-line blocking, leading to high tail latency and underutilized hardware. We introduce EWSJF (Effective Workload-based Shortest Job First), an adaptive request-level scheduler that learns workload structure in real time to jointly improve fairness and throughput. EWSJF operates upstream of execution-level schedulers and integrates four components: (1) Refine-and-Prune, an unsupervised partitioning algorithm that discovers performance-homogeneous request groups; (2) Dynamic Queue Routing for assigning requests to these groups; (3) Density-Weighted Scoring, a context-aware prioritization function balancing urgency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Big Data and Digital Economy · Parallel Computing and Optimization Techniques
