On the Evaluation of Large Language Models in Unit Test Generation

Lin Yang; Chen Yang; Shutao Gao; Weijing Wang; Bo Wang; Qihao Zhu,; Xiao Chu; Jianyi Zhou; Guangtai Liang; Qianxiang Wang; Junjie Chen

arXiv:2406.18181·cs.SE·September 26, 2024·2 cites

On the Evaluation of Large Language Models in Unit Test Generation

Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu,, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, Junjie Chen

PDF

Open Access

TL;DR

This paper empirically evaluates open-source large language models for automated Java unit test generation, comparing their performance with GPT-4 and traditional tools, and analyzing the impact of prompting strategies.

Contribution

First empirical study exploring open-source LLMs for unit test generation with diverse prompting strategies and comprehensive evaluation on Java projects.

Findings

01

Prompt factors significantly affect LLM performance

02

Open-source LLMs can rival GPT-4 in some scenarios

03

Limitations exist in current LLM-based test generation methods

Abstract

Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting strategies, leaving the capabilities of advanced open-source LLMs with various prompting settings unexplored. Particularly, open-source LLMs offer advantages in data privacy protection and have demonstrated superior performance in some tasks. Moreover, effective prompting is crucial for maximizing LLMs' capabilities. In this paper, we conduct the first empirical study to fill this gap, based on 17 Java projects, five widely-used open-source LLMs with different structures and parameter sizes, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods