Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction
Sungmin Kang, Juyeon Yoon, Nargiz Askarbekkyzy, Shin Yoo

TL;DR
This paper introduces LIBRO, a novel approach using large language models to automatically generate bug-reproducing tests from natural language bug reports, achieving significant success on the Defects4J benchmark and demonstrating the potential of open-source LLMs.
Contribution
The paper presents LIBRO, a new technique leveraging LLMs for automatic bug reproduction from natural language reports, including extensive evaluation across multiple models and benchmarks.
Findings
LIBRO reproduces about one-third of bugs in Defects4J.
Open-source LLMs like StarCoder achieve 70-90% of the performance of closed-source models.
Bug reproduction improves with larger LLM sizes.
Abstract
Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly focused on crash bugs, which are easier to automatically detect and verify. In this work, we overcome this limitation by using large language models (LLMs), which have been demonstrated to be adept at natural language processing and code generation. By prompting LLMs to generate bug-reproducing tests, and via a post-processing pipeline to automatically identify promising generated tests, our proposed technique LIBRO could successfully reproduce about one-third of all bugs in the widely used Defects4J benchmark. Furthermore, our extensive evaluation on 15 LLMs, including 11 open-source LLMs, suggests that open-source LLMs also demonstrate substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
