A Large-scale Empirical Study on Fine-tuning Large Language Models for   Unit Testing

Ye Shang; Quanjun Zhang; Chunrong Fang; Siqi Gu; Jianyi Zhou; Zhenyu; Chen

arXiv:2412.16620·cs.SE·December 24, 2024

A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing

Ye Shang, Quanjun Zhang, Chunrong Fang, Siqi Gu, Jianyi Zhou, Zhenyu, Chen

PDF

Open Access 1 Repo

TL;DR

This large-scale empirical study evaluates the effectiveness of fine-tuning various large language models for unit testing tasks, demonstrating their superiority over existing methods and exploring factors influencing their performance.

Contribution

It provides a comprehensive evaluation of 37 LLMs across multiple unit testing tasks and benchmarks, offering practical guidelines and comparing fine-tuning with prompt engineering.

Findings

01

LLMs outperform state-of-the-art methods across tasks

02

Large decoder-only models achieve the best results

03

Prompt engineering shows significant potential in unit testing

Abstract

Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large Language Models (LLMs) have shown potential in various unit testing tasks, including test generation, assertion generation, and test evolution, but existing studies are limited in scope and lack a systematic evaluation of the effectiveness of LLMs. To bridge this gap, we present a large-scale empirical study on fine-tuning LLMs for unit testing. Our study involves three unit testing tasks, five benchmarks, eight evaluation metrics, and 37 popular LLMs across various architectures and sizes, consuming over 3,000 NVIDIA A100 GPU hours. We focus on three key research questions: (1) the performance of LLMs compared to state-of-the-art methods, (2) the impact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iSEngLab/LLM4UT_Empirical
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques