FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in   Serverless Computing for Deep Learning Inference

Jianfeng Gu; Yichao Zhu; Puxuan Wang; Mohak Chadha; Michael Gerndt

arXiv:2309.00558·cs.DC·September 4, 2023

FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference

Jianfeng Gu, Yichao Zhu, Puxuan Wang, Mohak Chadha, Michael Gerndt

PDF

1 Repo

TL;DR

FaST-GShare introduces a GPU sharing architecture for serverless deep learning inference that improves resource utilization, reduces costs, and guarantees service levels through spatio-temporal multiplexing and intelligent scheduling.

Contribution

It presents a novel FaST-GShare architecture with a dedicated manager, profiler, and scheduler for efficient, SLO-aware GPU sharing in serverless DL inference.

Findings

01

3.15x throughput improvement over time sharing

02

1.34x GPU utilization increase

03

3.13x SM occupancy enhancement

Abstract

Serverless computing (FaaS) has been extensively utilized for deep learning (DL) inference due to the ease of deployment and pay-per-use benefits. However, existing FaaS platforms utilize GPUs in a coarse manner for DL inferences, without taking into account spatio-temporal resource multiplexing and isolation, which results in severe GPU under-utilization, high usage expenses, and SLO (Service Level Objectives) violation. There is an imperative need to enable an efficient and SLO-aware GPU-sharing mechanism in serverless computing to facilitate cost-effective DL inferences. In this paper, we propose \textbf{FaST-GShare}, an efficient \textit{\textbf{Fa}aS-oriented \textbf{S}patio-\textbf{T}emporal \textbf{G}PU \textbf{Sharing}} architecture for deep learning inferences. In the architecture, we introduce the FaST-Manager to limit and isolate spatio-temporal resources for GPU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KontonGu/FaST-GShare
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.