The Repeat Offenders: Characterizing and Predicting Extremely Bug-Prone Source Methods
Ethan Friesen, Sasha Morton-Salmon, Md Nahidul Islam Opu, Shahidul Islam, Shaiful Chowdhury

TL;DR
This study investigates extremely bug-prone methods in software, analyzing their characteristics, prevalence, and predictability, to improve bug prediction models and reduce software faults.
Contribution
It introduces the concept of ExtremelyBuggy methods, analyzes their features, and assesses their predictability at the time of their creation, highlighting their impact on software quality.
Findings
ExtremelyBuggy methods are few but cause most bugs.
They are difficult to predict at their inception.
Manual analysis reveals recurring harmful characteristics.
Abstract
Bug prediction has long been considered the "prince" of empirical software engineering research, and accordingly, a substantial body of work has focused on predicting bugs to enable early preventive actions. However, most existing studies operate at the class or file level, which practitioners have found to be of limited practical value. As a result, method-level bug prediction has gained increasing attention in recent years. Despite this shift, current method-level prediction models typically treat all buggy methods as equally fault-prone, regardless of whether a method has been associated with a bug once or repeatedly. We argue that methods involved in bugs multiple times-hereafter referred to as ExtremelyBuggy methods-are more harmful than methods that are buggy only once. In this study, we investigate the prevalence of ExtremelyBuggy methods, analyze their code quality metrics, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
