Continuous Integration Practices in Machine Learning Projects: The Practitioners` Perspective
Jo\~ao Helis Bernardo, Daniel Alencar da Costa, Filipe Roseiro Cogo,, S\'ergio Queir\'oz de Medeiros, Uir\'a Kulesza

TL;DR
This study explores the unique challenges of applying Continuous Integration in Machine Learning projects through a survey of practitioners, revealing key differences, barriers, and proposing tailored CI practices for ML.
Contribution
It provides qualitative insights into ML-specific CI challenges and introduces tailored practices to improve CI effectiveness in ML projects.
Findings
ML projects have longer build durations and lower test coverage.
Practitioners identify key differences like test complexity and infrastructure needs.
Barriers include data dependencies and non-determinism affecting testing.
Abstract
Continuous Integration (CI) is a cornerstone of modern software development. However, while widely adopted in traditional software projects, applying CI practices to Machine Learning (ML) projects presents distinctive characteristics. For example, our previous work revealed that ML projects often experience longer build durations and lower test coverage rates compared to their non-ML counterparts. Building on these quantitative findings, this study surveys 155 practitioners from 47 ML projects to investigate the underlying reasons for these distinctive characteristics through a qualitative perspective. Practitioners highlighted eight key differences, including test complexity, infrastructure requirements, and build duration and stability. Common challenges mentioned by practitioners include higher project complexity, model training demands, extensive data handling, increased…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
MethodsSparse Evolutionary Training
