Revisiting reopened bugs in open source software systems
Ankur Tagra, Haoxiang Zhang, Gopi Krishnan Rajbahadur, Ahmed E. Hassan

TL;DR
This study revisits reopened bug prediction in open source projects using modern machine learning techniques across 47 projects, revealing limited predictive success and identifying key reasons for bug reopenings.
Contribution
It introduces an updated prediction pipeline and large-scale analysis, providing new insights into the causes of bug reopenings in open source software.
Findings
Only 34% of projects achieve acceptable prediction performance (AUC >= 0.7).
Majority of reopened bugs (94%) are due to patch issues in projects with good prediction performance.
Four main reasons for bug reopening: technical, documentation, human, and others.
Abstract
Reopened bugs can degrade the overall quality of a software system since they require unnecessary rework by developers. Moreover, reopened bugs also lead to a loss of trust in the end-users regarding the quality of the software. Thus, predicting bugs that might be reopened could be extremely helpful for software developers to avoid rework. Prior studies on reopened bug prediction focus only on three open source projects (i.e., Apache, Eclipse, and OpenOffice) to generate insights. We observe that one out of the three projects (i.e., Apache) has a data leak issue -- the bug status of reopened was included as training data to predict reopened bugs. In addition, prior studies used an outdated prediction model pipeline (i.e., with old techniques for constructing a prediction model) to predict reopened bugs. Therefore, we revisit the reopened bugs study on a large scale dataset consisting of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research
