Large Language Models of Code Fail at Completing Code with Potential   Bugs

Tuan Dinh; Jinman Zhao; Samson Tan; Renato Negrinho; Leonard Lausen,; Sheng Zha; George Karypis

arXiv:2306.03438·cs.LG·December 4, 2023·2 cites

Large Language Models of Code Fail at Completing Code with Potential Bugs

Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen,, Sheng Zha, George Karypis

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how large language models of code perform when the code context contains potential bugs, revealing significant performance degradation and exploring mitigation strategies.

Contribution

It introduces the buggy-code completion problem, creates two datasets with synthetic and real bugs, and evaluates the impact on Code-LLMs' performance.

Findings

01

Presence of potential bugs reduces Code-LLMs' accuracy by over 50%.

02

Performance drops significantly with even a single potential bug.

03

Post-hoc mitigation methods still leave a substantial performance gap.

Abstract

Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired by the realistic scenario of real-time code suggestion where the code context contains potential bugs -- anti-patterns that can become bugs in the completed program. To systematically study the task, we introduce two datasets: one with synthetic bugs derived from semantics-altering operator changes (buggy-HumanEval) and one with realistic bugs derived from user submissions to coding problems (buggy-FixEval). We find that the presence of potential bugs significantly degrades the generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazon-science/buggy-code-completion
noneOfficial

Videos

Large Language Models of Code Fail at Completing Code with Potential Bugs· slideslive

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability