Enhancing IR-based Fault Localization using Large Language Models
Shuai Shao, Tingting Yu

TL;DR
This paper improves IR-based fault localization by using large language models to categorize bug reports, tailor query strategies, reformulate queries interactively, and apply learning-to-rank models, resulting in significantly better bug localization accuracy.
Contribution
It introduces a novel approach combining large language models, tailored query strategies, interactive reformulation, and learning-to-rank to enhance IR-based fault localization.
Findings
Achieved an MRR of 0.6770 and MAP of 0.5118 on 46 projects.
Outperformed seven state-of-the-art IRFL techniques.
Demonstrated significant improvements in bug localization accuracy.
Abstract
Information Retrieval-based Fault Localization (IRFL) techniques aim to identify source files containing the root causes of reported failures. While existing techniques excel in ranking source files, challenges persist in bug report analysis and query construction, leading to potential information loss. Leveraging large language models like GPT-4, this paper enhances IRFL by categorizing bug reports based on programming entities, stack traces, and natural language text. Tailored query strategies, the initial step in our approach (LLmiRQ), are applied to each category. To address inaccuracies in queries, we introduce a user and conversational-based query reformulation approach, termed LLmiRQ+. Additionally, to further enhance query utilization, we implement a learning-to-rank model that leverages key features such as class name match score and call graph score. This approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications
