Management of Machine Learning Lifecycle Artifacts: A Survey
Marius Schlegel, Kai-Uwe Sattler

TL;DR
This survey reviews over 60 systems supporting the management of machine learning artifacts throughout their lifecycle, highlighting their functionalities, scope, and the challenges in comparing and integrating these tools.
Contribution
It provides a comprehensive overview and assessment criteria for ML artifact management systems based on a systematic literature review.
Findings
Identified key functionalities of ML artifact management systems
Developed assessment criteria for comparing systems
Highlighted gaps and challenges in current tools
Abstract
The explorative and iterative nature of developing and operating machine learning (ML) applications leads to a variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software, configurations, and logs. In order to enable comparability, reproducibility, and traceability of these artifacts across the ML lifecycle steps and iterations, systems and tools have been developed to support their collection, storage, and management. It is often not obvious what precise functional scope such systems offer so that the comparison and the estimation of synergy effects between candidates are quite challenging. In this paper, we aim to give an overview of systems and platforms which support the management of ML lifecycle artifacts. Based on a systematic literature review, we derive assessment criteria and apply them to a representative selection of more than 60 systems and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Big Data and Business Intelligence · Scientific Computing and Data Management
