Evaluating Diverse Large Language Models for Automatic and General Bug   Reproduction

Sungmin Kang; Juyeon Yoon; Nargiz Askarbekkyzy; Shin Yoo

arXiv:2311.04532·cs.SE·November 10, 2023·1 cites

Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction

Sungmin Kang, Juyeon Yoon, Nargiz Askarbekkyzy, Shin Yoo

PDF

Open Access 1 Repo

TL;DR

This paper introduces LIBRO, a novel approach using large language models to automatically generate bug-reproducing tests from natural language bug reports, achieving significant success on the Defects4J benchmark and demonstrating the potential of open-source LLMs.

Contribution

The paper presents LIBRO, a new technique leveraging LLMs for automatic bug reproduction from natural language reports, including extensive evaluation across multiple models and benchmarks.

Findings

01

LIBRO reproduces about one-third of bugs in Defects4J.

02

Open-source LLMs like StarCoder achieve 70-90% of the performance of closed-source models.

03

Bug reproduction improves with larger LLM sizes.

Abstract

Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly focused on crash bugs, which are easier to automatically detect and verify. In this work, we overcome this limitation by using large language models (LLMs), which have been demonstrated to be adept at natural language processing and code generation. By prompting LLMs to generate bug-reproducing tests, and via a post-processing pipeline to automatically identify promising generated tests, our proposed technique LIBRO could successfully reproduce about one-third of all bugs in the widely used Defects4J benchmark. Furthermore, our extensive evaluation on 15 LLMs, including 11 open-source LLMs, suggests that open-source LLMs also demonstrate substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coinse/libro-journal-artifact
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques