An Empirical Evaluation of Modern MLOps Frameworks
Jon Marcos-Mercad\'e, Unai Lopez-Novoa, Mikel Ega\~na Aranguren

TL;DR
This paper empirically compares popular MLOps frameworks like MLflow, Metaflow, Airflow, and Kubeflow Pipelines across multiple criteria to guide developers in selecting suitable tools for different ML lifecycle tasks.
Contribution
It provides a systematic evaluation of MLOps tools based on practical criteria and offers insights into their suitability for various ML scenarios.
Findings
MLflow and Kubeflow excel in ease of installation.
Metaflow offers high configuration flexibility.
Airflow provides strong interoperability.
Abstract
Given the increasing adoption of AI solutions in professional environments, it is necessary for developers to be able to make informed decisions about the current tool landscape. This work empirically evaluates various MLOps (Machine Learning Operations) tools to facilitate the management of the ML model lifecycle: MLflow, Metaflow, Apache Airflow, and Kubeflow Pipelines. The tools are evaluated by assessing the criteria of Ease of installation, Configuration flexibility, Interoperability, Code instrumentation complexity, result interpretability, and Documentation when implementing two common ML scenarios: Digit classifier with MNIST and Sentiment classifier with IMDB and BERT. The evaluation is completed by providing weighted results that lead to practical conclusions on which tools are best suited for different scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Scientific Computing and Data Management · Software Engineering Research
