A Survey of Multi-Tenant Deep Learning Inference on GPU

Fuxun Yu; Di Wang; Longfei Shangguan; Minjia Zhang; Chenchen Liu,; Xiang Chen

arXiv:2203.09040·cs.DC·May 26, 2022

A Survey of Multi-Tenant Deep Learning Inference on GPU

Fuxun Yu, Di Wang, Longfei Shangguan, Minjia Zhang, Chenchen Liu,, Xiang Chen

PDF

TL;DR

This survey reviews the challenges and recent advances in multi-tenant deep learning inference on GPUs, highlighting optimization strategies to improve resource utilization and system performance.

Contribution

It categorizes emerging challenges and summarizes recent technological innovations in multi-tenant DL inference on GPU systems.

Findings

01

Identifies key challenges in multi-tenant DL inference.

02

Summarizes recent optimization techniques and innovations.

03

Provides a comprehensive overview of the entire optimization stack.

Abstract

Deep Learning (DL) models have achieved superior performance. Meanwhile, computing hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x throughput and memory bandwidth for each generation. With such strong computing scaling of GPUs, multi-tenant deep learning inference by co-locating multiple DL models onto the same GPU becomes widely deployed to improve resource utilization, enhance serving throughput, reduce energy cost, etc. However, achieving efficient multi-tenant DL inference is challenging which requires thorough full-stack system optimization. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU. By overviewing the entire optimization stack, summarizing the multi-tenant computing innovations, and elaborating the recent technological advances, we hope that this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.