Management of Machine Learning Lifecycle Artifacts: A Survey

Marius Schlegel; Kai-Uwe Sattler

arXiv:2210.11831·cs.DB·October 24, 2022·5 cites

Management of Machine Learning Lifecycle Artifacts: A Survey

Marius Schlegel, Kai-Uwe Sattler

PDF

Open Access

TL;DR

This survey reviews over 60 systems supporting the management of machine learning artifacts throughout their lifecycle, highlighting their functionalities, scope, and the challenges in comparing and integrating these tools.

Contribution

It provides a comprehensive overview and assessment criteria for ML artifact management systems based on a systematic literature review.

Findings

01

Identified key functionalities of ML artifact management systems

02

Developed assessment criteria for comparing systems

03

Highlighted gaps and challenges in current tools

Abstract

The explorative and iterative nature of developing and operating machine learning (ML) applications leads to a variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software, configurations, and logs. In order to enable comparability, reproducibility, and traceability of these artifacts across the ML lifecycle steps and iterations, systems and tools have been developed to support their collection, storage, and management. It is often not obvious what precise functional scope such systems offer so that the comparison and the estimation of synergy effects between candidates are quite challenging. In this paper, we aim to give an overview of systems and platforms which support the management of ML lifecycle artifacts. Based on a systematic literature review, we derive assessment criteria and apply them to a representative selection of more than 60 systems and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Big Data and Business Intelligence · Scientific Computing and Data Management