Bug Characterization in Machine Learning-based Systems
Mohammad Mehdi Morovati, Amin Nikanjam, Florian Tambon, Foutse Khomh,, Zhen Ming (Jack) Jiang

TL;DR
This study analyzes bug characteristics in ML-based systems, revealing that ML components are more error-prone and costly to fix than non-ML parts, emphasizing the need for focused reliability efforts.
Contribution
It provides a comprehensive empirical analysis of ML bugs, highlighting their root causes, complexity, and differences from non-ML bugs in real-world systems.
Findings
Nearly half of issues in ML systems are ML bugs.
ML bugs are more complex and costly to fix.
ML components are more error-prone than non-ML components.
Abstract
Rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Understanding the bugs characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research
MethodsFocus
