Order Matters! An Empirical Study on Large Language Models' Input Order Bias in Software Fault Localization
Md Nakhla Rafi, Dong Jae Kim, Tse-Hsun Chen, Shaowei Wang

TL;DR
This paper empirically studies how input order bias affects large language models' performance in software fault localization, revealing significant order bias and proposing methods to mitigate it for improved accuracy.
Contribution
It identifies the impact of input order bias in LLM-based fault localization and evaluates strategies to reduce this bias, enhancing model reliability in software engineering tasks.
Findings
Order bias significantly reduces FL accuracy in LLMs.
Breaking inputs into smaller contexts mitigates order bias.
Ordering by DepGraph ranking outperforms simple methods.
Abstract
Large Language Models (LLMs) show great promise in software engineering tasks like Fault Localization (FL) and Automatic Program Repair (APR). This study investigates the impact of input order and context size on LLM performance in FL, a crucial step for many downstream software engineering tasks. We test different orders for methods using Kendall Tau distances, including "perfect" (where ground truths come first) and "worst" (where ground truths come last), using two benchmarks that consist of both Java and Python projects. Our results indicate a significant bias in order; Top-1 FL accuracy in Java projects drops from 57% to 20%, while in Python projects, it decreases from 38% to approximately 3% when we reverse the code order. Breaking down inputs into smaller contexts helps reduce this bias, narrowing the performance gap in FL from 22% to 6% and then to just 1% on both benchmarks. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research
