Towards Explorative IRBL: Combining Semantic Retrieval with LLM-driven Iterative Code Exploration

Moumita Asad; Rafed Muhammad Yasir; Sam Malek

arXiv:2508.00253·cs.SE·April 23, 2026

Towards Explorative IRBL: Combining Semantic Retrieval with LLM-driven Iterative Code Exploration

Moumita Asad, Rafed Muhammad Yasir, Sam Malek

PDF

TL;DR

This paper introduces GenLoc, a novel method combining semantic retrieval and LLM-driven code exploration to improve bug localization accuracy across multiple datasets.

Contribution

It presents GenLoc, a new approach that effectively integrates semantic retrieval with iterative LLM-based code analysis for bug localization.

Findings

01

GenLoc outperforms traditional IRBL and deep learning methods.

02

It localizes bugs that other techniques fail to detect.

03

GenLoc performs well on Java and Python datasets.

Abstract

Information Retrieval-based Bug Localization (IRBL) aims to identify buggy source files for a given bug report. Traditional and deep learning-based IRBL techniques often suffer from vocabulary mismatch and dependence on project-specific metadata. In contrast, recent Large Language Model (LLM)-based approaches struggle to provide appropriate context to the model: they either restrict analysis to a fixed set of candidate files, overwhelm the model with repository-wide information, or rely on explicit bug report cues to guide context collection. To address these issues, we propose GenLoc, a technique that combines semantic retrieval with LLM-driven code-exploration functions to iteratively analyze the code base and identify buggy files. We evaluate GenLoc on three complementary benchmarks, including large-scale and recent Java datasets as well as the Python based SWE-bench Lite dataset.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.