Large Language Models for Fault Localization: An Empirical Study

YingJian Xiao; RongQun Hu; WeiWei Gong; HongWei Li; AnQuan Jie

arXiv:2510.20521·cs.SE·October 24, 2025

Large Language Models for Fault Localization: An Empirical Study

YingJian Xiao, RongQun Hu, WeiWei Gong, HongWei Li, AnQuan Jie

PDF

Open Access

TL;DR

This empirical study evaluates how large language models perform in fault localization tasks for code, analyzing various models and prompting strategies to identify effective approaches and trade-offs.

Contribution

It provides a comprehensive evaluation of open-source and closed-source LLMs for statement-level fault localization, highlighting the impact of different prompting strategies.

Findings

01

Bug report context improves model performance

02

Few-shot learning offers some benefits but with diminishing returns

03

Chain-of-thought reasoning's effectiveness depends on model capabilities

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, particularly in automated program repair. However, the effectiveness of such repairs is highly dependent on the performance of upstream fault localization, for which comprehensive evaluations are currently lacking. This paper presents a systematic empirical study on LLMs in the statement-level code fault localization task. We evaluate representative open-source models (Qwen2.5-coder-32b-instruct, DeepSeek-V3) and closed-source models (GPT-4.1 mini, Gemini-2.5-flash) to assess their fault localization capabilities on the HumanEval-Java and Defects4J datasets. The study investigates the impact of different prompting strategies--including standard prompts, few-shot examples, and chain-of-reasoning--on model performance, with a focus on analysis across accuracy, time efficiency, and economic cost…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability