An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation
Max Sch\"afer, Sarah Nadi, Aryaz Eghbali, Frank Tip

TL;DR
This paper empirically evaluates the effectiveness of large language models in automatically generating unit tests for JavaScript functions without additional training, achieving high coverage and diversity in generated tests.
Contribution
It introduces TestPilot, a tool that uses LLMs for automated unit test generation, demonstrating significant coverage improvements over existing techniques in a large-scale empirical study.
Findings
Median statement coverage of 70.2% achieved by TestPilot
92.8% of generated tests are less than 50% similar to existing tests
Similar results obtained with different LLMs, indicating model size and training data influence effectiveness
Abstract
Unit tests play a key role in ensuring the correctness of software. However, manually creating unit tests is a laborious task, motivating the need for automation. Large Language Models (LLMs) have recently been applied to this problem, utilizing additional training or few-shot learning on examples of existing tests. This paper presents a large-scale empirical evaluation on the effectiveness of LLMs for automated unit test generation without additional training or manual effort, providing the LLM with the signature and implementation of the function under test, along with usage examples extracted from documentation. We also attempt to repair failed generated tests by re-prompting the model with the failing test and error message. We implement our approach in TestPilot, a test generation tool for JavaScript that automatically generates unit tests for all API functions in an npm package.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Natural Language Processing Techniques
MethodsNone · Repair · Test
