Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization
Genevieve Caumartin, Glaucia Melo

TL;DR
This paper introduces an LLM-powered agent that reformulates bug reports to improve file-level bug localization accuracy, achieving significant performance gains over traditional methods.
Contribution
It demonstrates how lightweight query reformulation with LLMs enhances bug localization effectiveness at scale, a novel application in this domain.
Findings
35% better first-file retrieval ranking
Up to +22% improvement over SWE-agent
Effective use of non-fine-tuned LLMs for query reformulation
Abstract
Bug localization remains a critical yet time-consuming challenge in large-scale software repositories. Traditional information retrieval-based bug localization (IRBL) methods rely on unchanged bug descriptions, which often contain noisy information, leading to poor retrieval accuracy. Recent advances in large language models (LLMs) have improved bug localization through query reformulation, yet the effect on agent performance remains unexplored. In this study, we investigate how an LLM-powered agent can improve file-level bug localization via lightweight query reformulation and summarization. We first employ an open-source, non-fine-tuned LLM to extract key information from bug reports, such as identifiers and code snippets, and reformulate queries pre-retrieval. Our agent then orchestrates BM25 retrieval using these preprocessed queries, automating localization workflow at scale. Using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Topic Modeling
