On the Cost of Model-Serving Frameworks: An Experimental Evaluation

Pasquale De Rosa; Y\'erom-David Bromberg; Pascal Felber; Djob Mvondo,; Valerio Schiavoni

arXiv:2411.10337·cs.LG·November 18, 2024

On the Cost of Model-Serving Frameworks: An Experimental Evaluation

Pasquale De Rosa, Y\'erom-David Bromberg, Pascal Felber, Djob Mvondo,, Valerio Schiavoni

PDF

TL;DR

This paper evaluates five popular ML model-serving frameworks across various scenarios, revealing TensorFlow Serving's superior performance and lower latency for deep learning models in production environments.

Contribution

It provides an empirical comparison of model-serving frameworks, highlighting the performance advantages of TensorFlow Serving and DL-specific frameworks over general-purpose ones.

Findings

01

TensorFlow Serving outperforms other frameworks in deep learning model serving.

02

DL-specific frameworks have significantly lower latency than general-purpose frameworks.

03

Performance varies depending on the serving scenario and framework used.

Abstract

In machine learning (ML), the inference phase is the process of applying pre-trained models to new, unseen data with the objective of making predictions. During the inference phase, end-users interact with ML services to gain insights, recommendations, or actions based on the input data. For this reason, serving strategies are nowadays crucial for deploying and managing models in production environments effectively. These strategies ensure that models are available, scalable, reliable, and performant for real-world applications, such as time series forecasting, image classification, natural language processing, and so on. In this paper, we evaluate the performances of five widely-used model serving frameworks (TensorFlow Serving, TorchServe, MLServer, MLflow, and BentoML) under four different scenarios (malware detection, cryptocoin prices forecasting, image classification, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.