On the Cost of Model-Serving Frameworks: An Experimental Evaluation
Pasquale De Rosa, Y\'erom-David Bromberg, Pascal Felber, Djob Mvondo,, Valerio Schiavoni

TL;DR
This paper evaluates five popular ML model-serving frameworks across various scenarios, revealing TensorFlow Serving's superior performance and lower latency for deep learning models in production environments.
Contribution
It provides an empirical comparison of model-serving frameworks, highlighting the performance advantages of TensorFlow Serving and DL-specific frameworks over general-purpose ones.
Findings
TensorFlow Serving outperforms other frameworks in deep learning model serving.
DL-specific frameworks have significantly lower latency than general-purpose frameworks.
Performance varies depending on the serving scenario and framework used.
Abstract
In machine learning (ML), the inference phase is the process of applying pre-trained models to new, unseen data with the objective of making predictions. During the inference phase, end-users interact with ML services to gain insights, recommendations, or actions based on the input data. For this reason, serving strategies are nowadays crucial for deploying and managing models in production environments effectively. These strategies ensure that models are available, scalable, reliable, and performant for real-world applications, such as time series forecasting, image classification, natural language processing, and so on. In this paper, we evaluate the performances of five widely-used model serving frameworks (TensorFlow Serving, TorchServe, MLServer, MLflow, and BentoML) under four different scenarios (malware detection, cryptocoin prices forecasting, image classification, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
