Impact of Large Language Models of Code on Fault Localization

Suhwan Ji; Sanghwa Lee; Changsup Lee; Hyeonseung Im; Yo-Sub Han

arXiv:2408.09657·cs.SE·August 20, 2024

Impact of Large Language Models of Code on Fault Localization

Suhwan Ji, Sanghwa Lee, Changsup Lee, Hyeonseung Im, Yo-Sub Han

PDF

Open Access

TL;DR

This paper introduces a novel sequence generation fine-tuning method for large language models of code to improve fault localization, enabling analysis even with syntactic errors and outperforming existing techniques on benchmark data.

Contribution

The paper presents the first application of sequence generation fine-tuning of LLMCs for fault localization, demonstrating improved accuracy over state-of-the-art methods.

Findings

01

LLMCs achieved up to 72.3% top-1 accuracy in fault localization.

02

Fine-tuned LLMCs outperform existing learning-based FL techniques by up to 1.35 times.

03

Method works effectively even with syntactic errors in code.

Abstract

Identifying the point of error is imperative in software debugging. Traditional fault localization (FL) techniques rely on executing the program and using the code coverage matrix in tandem with test case results to calculate a suspiciousness score for each function or line. Recently, learning-based FL techniques have harnessed machine learning models to extract meaningful features from the code coverage matrix and improve FL performance. These techniques, however, require compilable source code, existing test cases, and specialized tools for generating the code coverage matrix for each programming language of interest. In this paper, we propose, for the first time, a simple but effective sequence generation approach for fine-tuning large language models of code (LLMCs) for FL tasks. LLMCs have recently received much attention for various software engineering problems. In line with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability