A Large-Scale Study of Model Integration in ML-Enabled Software Systems
Yorick Sens, Henriette Knopp, Sven Peldszus, Thorsten Berger

TL;DR
This large-scale study investigates the characteristics, reuse practices, and integration patterns of nearly 3,000 open-source ML-enabled software systems to improve understanding and engineering practices for such systems.
Contribution
It provides the first comprehensive empirical analysis of real-world ML-enabled systems, revealing common practices and architectural patterns.
Findings
Traditional code dominates ML-enabled systems
ML model reuse often involves code duplication or pre-trained models
Identified various ML integration patterns
Abstract
The rise of machine learning (ML) and its integration into software systems has drastically changed development practices. While software engineering traditionally focused on manually created code artifacts with dedicated processes and architectures, ML-enabled systems require additional data-science methods and tools to create ML artifacts -- especially ML models and training data. However, integrating models into systems, and managing the many different artifacts involved, is far from trivial. ML-enabled systems can easily have multiple ML models that interact with each other and with traditional code in intricate ways. Unfortunately, while challenges and practices of building ML-enabled systems have been studied, little is known about the characteristics of real-world ML-enabled systems beyond isolated examples. Improving engineering processes and architectures for ML-enabled systems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Service-Oriented Architecture and Web Services · Business Process Modeling and Analysis
