Comparative analysis of real bugs in open-source Machine Learning projects -- A Registered Report
Tuan Dung Lai, Anj Simmons, Scott Barnett, Jean-Guy Schneider, Rajesh, Vasa

TL;DR
This study compares the resolution time and fix size of bugs in open-source ML projects versus traditional software, revealing domain-specific differences in issue reporting and resolution processes.
Contribution
It provides an empirical analysis of real bug reports in open-source ML projects, highlighting differences from traditional software issues and informing maintenance practices.
Findings
ML issues tend to require longer resolution times.
Certain categories of ML bugs have larger fix sizes.
Differences in reporting needs impact issue resolution processes.
Abstract
Background: Machine Learning (ML) systems rely on data to make predictions, the systems have many added components compared to traditional software systems such as the data processing pipeline, serving pipeline, and model training. Existing research on software maintenance has studied the issue-reporting needs and resolution process for different types of issues, such as performance and security issues. However, ML systems have specific classes of faults, and reporting ML issues requires domain-specific information. Because of the different characteristics between ML and traditional Software Engineering systems, we do not know to what extent the reporting needs are different, and to what extent these differences impact the issue resolution process. Objective: Our objective is to investigate whether there is a discrepancy in the distribution of resolution time between ML and non-ML…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
