Duration Aware Scheduling for ASR Serving Under Workload Drift
Darshan Makwana, Yash Jogi, Harsh Kotta, Aayush Kubba

TL;DR
This paper introduces duration-aware scheduling algorithms for ASR serving that significantly reduce latency and manage workload variability by leveraging audio duration as a proxy for processing time, improving performance over traditional FCFS methods.
Contribution
It proposes and evaluates duration-aware scheduling algorithms, SJF and HRRN, tailored for ASR workloads, demonstrating substantial latency improvements and robustness under workload drift.
Findings
SJF reduces median latency by up to 73% but increases tail latency.
HRRN balances latency reduction and tail latency, with up to 28% median latency decrease.
Algorithms operate with minimal overhead (<0.1 ms per request).
Abstract
Scheduling policies in large-scale Automatic Speech Recognition (ASR) serving pipelines play a key role in determining end-to-end (E2E) latency. Yet, widely used serving engines rely on first-come-first-served (FCFS) scheduling, which ignores variability in request duration and leads to head-of-line blocking under workload drift. We show that audio duration is an accurate proxy for job processing time in ASR models such as Whisper, and use this insight to enable duration-aware scheduling. We integrate two classical algorithms, Shortest Job First (SJF) and Highest Response Ratio Next (HRRN), into vLLM and evaluate them under realistic and drifted workloads. On LibriSpeech test-clean, compared to baseline, SJF reduces median E2E latency by up to at high load, but increases th-percentile tail latency by up to due to starvation of long requests. HRRN addresses this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Green IT and Sustainability · Software System Performance and Reliability
