A Survey of Serverless Machine Learning Model Inference

Kamil Kojs

arXiv:2311.13587·cs.DC·November 23, 2023·2 cites

A Survey of Serverless Machine Learning Model Inference

Kamil Kojs

PDF

Open Access

TL;DR

This survey reviews the emerging challenges and opportunities in deploying large-scale deep learning models in serverless architectures, emphasizing GPU access, optimization, and reliability in production environments.

Contribution

It introduces a novel taxonomy for serverless deep learning inference systems and summarizes recent trends to guide future research and development.

Findings

01

Identifies key challenges in serverless ML deployment

02

Categorizes optimization strategies for large-scale inference

03

Highlights trends in GPU utilization and reliability improvements

Abstract

Recent developments in Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This widespread adoption of AI requires significant efforts in deploying these models in production environments. When hosting machine learning models for real-time predictions, it is important to meet defined Service Level Objectives (SLOs), ensuring reliability, minimal downtime, and optimizing operational costs of the underlying infrastructure. Large machine learning models often demand GPU resources for efficient inference to meet SLOs. In the context of these trends, there is growing interest in hosting AI models in a serverless architecture while still providing GPU access for inference tasks. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Brain Tumor Detection and Classification

Methodstravel james