Towards Fast Setup and High Throughput of GPU Serverless Computing

Han Zhao; Weihao Cui; Quan Chen; Shulai Zhang; Zijun Li; Jingwen Leng,; Chao Li; Deze Zeng; Minyi Guo

arXiv:2404.14691·cs.DC·April 24, 2024

Towards Fast Setup and High Throughput of GPU Serverless Computing

Han Zhao, Weihao Cui, Quan Chen, Shulai Zhang, Zijun Li, Jingwen Leng,, Chao Li, Deze Zeng, Minyi Guo

PDF

Open Access

TL;DR

This paper introduces SAGE, a GPU serverless framework that significantly reduces setup time and increases throughput by parallelizing data preparation and sharing memory across function invocations.

Contribution

SAGE presents novel parallelized setup and memory sharing mechanisms to enhance GPU serverless computing efficiency and throughput.

Findings

01

Reduces function duration by 11.3 times

02

Improves function density by 1.22 times

03

Outperforms state-of-the-art platforms in experiments

Abstract

Integrating GPUs into serverless computing platforms is crucial for improving efficiency. However, existing solutions for GPU-enabled serverless computing platforms face two significant problems due to coarse-grained GPU management: long setup time and low function throughput. To address these issues, we propose SAGE, a GPU serverless framework with fast setup and high throughput. First, based on the data knowability of GPU function ahead of actual execution, SAGE first devises the parallelized function setup mechanism, which parallelizes the data preparation and context creation. In this way, SAGE achieves fast setup of GPU function invocations.Second, SAGE further proposes the sharing-based memory management mechanism, which shares the read-only memory and context memory across multiple invocations of the same function. The memory sharing mechanism avoids repeated data preparation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques