TL;DR
This paper introduces Globug, a framework that enhances fault localization by leveraging global data from open-source projects, improving existing IR-based techniques like BugLocator without adding runtime overhead.
Contribution
Globug is the first IR fault localization framework to incorporate pre-trained models on global data, significantly improving localization accuracy over local data methods.
Findings
Global data improves BugLocator's MRR by 6.6%
Global data improves BugLocator's MAP by 4.8%
Word Embedding with global data did not further improve results
Abstract
Fault Localization (FL) is an important first step in software debugging and is mostly manual in the current practice. Many methods have been proposed over years to automate the FL process, including information retrieval (IR)-based techniques. These methods localize the fault based on the similarity of the reported bug report and the source code. Newer variations of IR-based FL (IRFL) techniques also look into the history of bug reports and leverage them during the localization. However, all existing IRFL techniques limit themselves to the current project's data (local data). In this study, we introduce Globug, which is an IRFL framework consisting of methods that use models pre-trained on the global data (extracted from open-source benchmark projects). In Globug, we investigate two heuristics: a) the effect of global data on a state-of-the-art IR-FL technique, namely BugLocator, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
