Serving deep learning models in a serverless platform
Vatche Ishakian, Vinod Muthusamy, Aleksander Slominski

TL;DR
This paper evaluates the feasibility of deploying large neural network models on serverless platforms, specifically AWS Lambda, highlighting latency issues like cold start delays that impact service level agreements.
Contribution
It provides an empirical assessment of deep learning inference performance in serverless environments using AWS Lambda and MxNet.
Findings
Inferencing latency is acceptable for some applications.
Cold start delays significantly affect latency distribution.
Latency variability poses challenges for strict SLAs.
Abstract
Serverless computing has emerged as a compelling paradigm for the development and deployment of a wide range of event based cloud applications. At the same time, cloud providers and enterprise companies are heavily adopting machine learning and Artificial Intelligence to either differentiate themselves, or provide their customers with value added services. In this work we evaluate the suitability of a serverless computing environment for the inferencing of large neural network models. Our experimental evaluations are executed on the AWS Lambda environment using the MxNet deep learning framework. Our experimental results show that while the inferencing latency can be within an acceptable range, longer delays due to cold starts can skew the latency distribution and hence risk violating more stringent SLAs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
