A Large-Scale Study of Model Integration in ML-Enabled Software Systems

Yorick Sens; Henriette Knopp; Sven Peldszus; Thorsten Berger

arXiv:2408.06226·cs.SE·February 25, 2025·2 cites

A Large-Scale Study of Model Integration in ML-Enabled Software Systems

Yorick Sens, Henriette Knopp, Sven Peldszus, Thorsten Berger

PDF

Open Access

TL;DR

This large-scale study investigates the characteristics, reuse practices, and integration patterns of nearly 3,000 open-source ML-enabled software systems to improve understanding and engineering practices for such systems.

Contribution

It provides the first comprehensive empirical analysis of real-world ML-enabled systems, revealing common practices and architectural patterns.

Findings

01

Traditional code dominates ML-enabled systems

02

ML model reuse often involves code duplication or pre-trained models

03

Identified various ML integration patterns

Abstract

The rise of machine learning (ML) and its integration into software systems has drastically changed development practices. While software engineering traditionally focused on manually created code artifacts with dedicated processes and architectures, ML-enabled systems require additional data-science methods and tools to create ML artifacts -- especially ML models and training data. However, integrating models into systems, and managing the many different artifacts involved, is far from trivial. ML-enabled systems can easily have multiple ML models that interact with each other and with traditional code in intricate ways. Unfortunately, while challenges and practices of building ML-enabled systems have been studied, little is known about the characteristics of real-world ML-enabled systems beyond isolated examples. Improving engineering processes and architectures for ML-enabled systems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Service-Oriented Architecture and Web Services · Business Process Modeling and Analysis