Hummingbird: SLO-Oriented GPU Preemption at Microsecond-scale
Tiancheng Hu, Chenxi Wang, Ting Cao, Jin Qin, Lei Chen, Xinyu Xiao, Junhao Hu, Hongliang Tian, Shoumeng Yan, Huimin Cui, Quan Chen, Tao Xie

TL;DR
Hummingbird is a GPU scheduling system that enables microsecond-scale preemption on closed-source GPUs, significantly improving SLO adherence and GPU utilization for high-priority tasks while maintaining high throughput for low-priority tasks.
Contribution
Hummingbird introduces the first microsecond-scale preemption mechanism for closed-source GPUs, enabling SLO-oriented scheduling and efficient GPU resource utilization.
Findings
Improves high-priority task SLO attainment by up to 9.7x.
Maintains less than 1% SLO drop when collocating tasks.
Enhances low-priority task throughput by 2.4x.
Abstract
Existing GPU-sharing techniques, including spatial and temporal sharing, aim to improve utilization but face challenges in simultaneously ensuring SLO adherence and maximizing efficiency due to the lack of fine-grained task scheduling on closed-source GPUs. This paper presents Hummingbird, an SLO-oriented GPU scheduling system that overcomes these challenges by enabling microsecond-scale preemption on closed-source GPUs while effectively harvesting idle GPU time slices. Comprehensive evaluations across diverse GPU architectures reveal that Hummingbird improves the SLO attainment of high-priority tasks by 9.7x and 3.5x compared to the state-of-the-art spatial and temporal-sharing approaches. When compared to executing exclusively, the SLO attainment of the high-priority task, collocating with low-priority tasks on Hummingbird, only drops by less than 1%. Meanwhile, the throughput of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Distributed and Parallel Computing Systems
