Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU
Fuxun Yu, Shawn Bray, Di Wang, Longfei Shangguan, Xulong Tang,, Chenchen Liu, Xiang Chen

TL;DR
This paper introduces a GPU resource-aware scheduling framework for multi-tenant DNN inference, optimizing concurrency and operator interleaving to enhance runtime efficiency in complex, multi-model scenarios.
Contribution
It presents a novel automated scheduling framework with a unified intermediate representation and ML-based search for optimized multi-tenant DNN inference on GPUs.
Findings
Achieves 1.3-1.7x speed-up over standard libraries and scheduling methods.
Maintains balanced resource utilization during inference.
Improves efficiency in multi-model DNN deployments.
Abstract
With the fast development of deep neural networks (DNNs), many real-world applications are adopting multiple models to conduct compound tasks, such as co-running classification, detection, and segmentation models on autonomous vehicles. Such multi-tenant DNN inference cases greatly exacerbate the computational complexity and call for comprehensive collaboration for graph-level operator scheduling, runtime-level resource awareness, as well as hardware scheduler support. However, the current scheduling support for such multi-tenant inference is still relatively backward. In this work, we propose a resource-aware scheduling framework for efficient multi-tenant DNN inference on GPU, which automatically coordinates DNN computing in different execution levels. Leveraging the unified scheduling intermediate representation and the automated ML-based searching algorithm, optimal schedules could…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Advanced Graph Neural Networks
