A Survey of Serverless Machine Learning Model Inference
Kamil Kojs

TL;DR
This survey reviews the emerging challenges and opportunities in deploying large-scale deep learning models in serverless architectures, emphasizing GPU access, optimization, and reliability in production environments.
Contribution
It introduces a novel taxonomy for serverless deep learning inference systems and summarizes recent trends to guide future research and development.
Findings
Identifies key challenges in serverless ML deployment
Categorizes optimization strategies for large-scale inference
Highlights trends in GPU utilization and reliability improvements
Abstract
Recent developments in Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This widespread adoption of AI requires significant efforts in deploying these models in production environments. When hosting machine learning models for real-time predictions, it is important to meet defined Service Level Objectives (SLOs), ensuring reliability, minimal downtime, and optimizing operational costs of the underlying infrastructure. Large machine learning models often demand GPU resources for efficient inference to meet SLOs. In the context of these trends, there is growing interest in hosting AI models in a serverless architecture while still providing GPU access for inference tasks. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Brain Tumor Detection and Classification
Methodstravel james
