Large Language Models for Fault Localization: An Empirical Study
YingJian Xiao, RongQun Hu, WeiWei Gong, HongWei Li, AnQuan Jie

TL;DR
This empirical study evaluates how large language models perform in fault localization tasks for code, analyzing various models and prompting strategies to identify effective approaches and trade-offs.
Contribution
It provides a comprehensive evaluation of open-source and closed-source LLMs for statement-level fault localization, highlighting the impact of different prompting strategies.
Findings
Bug report context improves model performance
Few-shot learning offers some benefits but with diminishing returns
Chain-of-thought reasoning's effectiveness depends on model capabilities
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, particularly in automated program repair. However, the effectiveness of such repairs is highly dependent on the performance of upstream fault localization, for which comprehensive evaluations are currently lacking. This paper presents a systematic empirical study on LLMs in the statement-level code fault localization task. We evaluate representative open-source models (Qwen2.5-coder-32b-instruct, DeepSeek-V3) and closed-source models (GPT-4.1 mini, Gemini-2.5-flash) to assess their fault localization capabilities on the HumanEval-Java and Defects4J datasets. The study investigates the impact of different prompting strategies--including standard prompts, few-shot examples, and chain-of-reasoning--on model performance, with a focus on analysis across accuracy, time efficiency, and economic cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
