Order Matters! An Empirical Study on Large Language Models' Input Order Bias in Software Fault Localization

Md Nakhla Rafi; Dong Jae Kim; Tse-Hsun Chen; Shaowei Wang

arXiv:2412.18750·cs.SE·September 30, 2025

Order Matters! An Empirical Study on Large Language Models' Input Order Bias in Software Fault Localization

Md Nakhla Rafi, Dong Jae Kim, Tse-Hsun Chen, Shaowei Wang

PDF

Open Access

TL;DR

This paper empirically studies how input order bias affects large language models' performance in software fault localization, revealing significant order bias and proposing methods to mitigate it for improved accuracy.

Contribution

It identifies the impact of input order bias in LLM-based fault localization and evaluates strategies to reduce this bias, enhancing model reliability in software engineering tasks.

Findings

01

Order bias significantly reduces FL accuracy in LLMs.

02

Breaking inputs into smaller contexts mitigates order bias.

03

Ordering by DepGraph ranking outperforms simple methods.

Abstract

Large Language Models (LLMs) show great promise in software engineering tasks like Fault Localization (FL) and Automatic Program Repair (APR). This study investigates the impact of input order and context size on LLM performance in FL, a crucial step for many downstream software engineering tasks. We test different orders for methods using Kendall Tau distances, including "perfect" (where ground truths come first) and "worst" (where ground truths come last), using two benchmarks that consist of both Java and Python projects. Our results indicate a significant bias in order; Top-1 FL accuracy in Java projects drops from 57% to 20%, while in Python projects, it decreases from 38% to approximately 3% when we reverse the code order. Breaking down inputs into smaller contexts helps reduce this bias, narrowing the performance gap in FL from 22% to 6% and then to just 1% on both benchmarks. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research