Bug Characterization in Machine Learning-based Systems

Mohammad Mehdi Morovati; Amin Nikanjam; Florian Tambon; Foutse Khomh,; Zhen Ming (Jack) Jiang

arXiv:2307.14512·cs.SE·July 28, 2023·1 cites

Bug Characterization in Machine Learning-based Systems

Mohammad Mehdi Morovati, Amin Nikanjam, Florian Tambon, Foutse Khomh,, Zhen Ming (Jack) Jiang

PDF

Open Access 1 Repo

TL;DR

This study analyzes bug characteristics in ML-based systems, revealing that ML components are more error-prone and costly to fix than non-ML parts, emphasizing the need for focused reliability efforts.

Contribution

It provides a comprehensive empirical analysis of ML bugs, highlighting their root causes, complexity, and differences from non-ML bugs in real-world systems.

Findings

01

Nearly half of issues in ML systems are ML bugs.

02

ML bugs are more complex and costly to fix.

03

ML components are more error-prone than non-ML components.

Abstract

Rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Understanding the bugs characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ml-bugs-2022/replication-package
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research

MethodsFocus