An Empirical Evaluation of Using Large Language Models for Automated   Unit Test Generation

Max Sch\"afer; Sarah Nadi; Aryaz Eghbali; Frank Tip

arXiv:2302.06527·cs.SE·December 12, 2023·53 cites

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

Max Sch\"afer, Sarah Nadi, Aryaz Eghbali, Frank Tip

PDF

Open Access 2 Repos

TL;DR

This paper empirically evaluates the effectiveness of large language models in automatically generating unit tests for JavaScript functions without additional training, achieving high coverage and diversity in generated tests.

Contribution

It introduces TestPilot, a tool that uses LLMs for automated unit test generation, demonstrating significant coverage improvements over existing techniques in a large-scale empirical study.

Findings

01

Median statement coverage of 70.2% achieved by TestPilot

02

92.8% of generated tests are less than 50% similar to existing tests

03

Similar results obtained with different LLMs, indicating model size and training data influence effectiveness

Abstract

Unit tests play a key role in ensuring the correctness of software. However, manually creating unit tests is a laborious task, motivating the need for automation. Large Language Models (LLMs) have recently been applied to this problem, utilizing additional training or few-shot learning on examples of existing tests. This paper presents a large-scale empirical evaluation on the effectiveness of LLMs for automated unit test generation without additional training or manual effort, providing the LLM with the signature and implementation of the function under test, along with usage examples extracted from documentation. We also attempt to repair failed generated tests by re-prompting the model with the failing test and error message. We implement our approach in TestPilot, a test generation tool for JavaScript that automatically generates unit tests for all API functions in an npm package.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Natural Language Processing Techniques

MethodsNone · Repair · Test