Towards Explorative IRBL: Combining Semantic Retrieval with LLM-driven Iterative Code Exploration
Moumita Asad, Rafed Muhammad Yasir, Sam Malek

TL;DR
This paper introduces GenLoc, a novel method combining semantic retrieval and LLM-driven code exploration to improve bug localization accuracy across multiple datasets.
Contribution
It presents GenLoc, a new approach that effectively integrates semantic retrieval with iterative LLM-based code analysis for bug localization.
Findings
GenLoc outperforms traditional IRBL and deep learning methods.
It localizes bugs that other techniques fail to detect.
GenLoc performs well on Java and Python datasets.
Abstract
Information Retrieval-based Bug Localization (IRBL) aims to identify buggy source files for a given bug report. Traditional and deep learning-based IRBL techniques often suffer from vocabulary mismatch and dependence on project-specific metadata. In contrast, recent Large Language Model (LLM)-based approaches struggle to provide appropriate context to the model: they either restrict analysis to a fixed set of candidate files, overwhelm the model with repository-wide information, or rely on explicit bug report cues to guide context collection. To address these issues, we propose GenLoc, a technique that combines semantic retrieval with LLM-driven code-exploration functions to iteratively analyze the code base and identify buggy files. We evaluate GenLoc on three complementary benchmarks, including large-scale and recent Java datasets as well as the Python based SWE-bench Lite dataset.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
