No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation
Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang,, Yixuan Chen, Xin Peng

TL;DR
This paper empirically evaluates ChatGPT's ability to generate unit tests, revealing promising results in test quality but highlighting correctness issues, and proposes an improved approach called ChatTESTER that enhances test generation success.
Contribution
It provides the first systematic evaluation of ChatGPT for unit test generation and introduces ChatTESTER, a method to improve test correctness and coverage.
Findings
ChatGPT generates tests with comparable coverage and readability to manual tests.
Generated tests still face correctness issues like compilation errors and failures.
ChatTESTER improves the number of compilable and correct tests significantly.
Abstract
Unit testing is essential in detecting bugs in functionally-discrete program units. Manually writing high-quality unit tests is time-consuming and laborious. Although traditional techniques can generate tests with reasonable coverage, they exhibit low readability and cannot be directly adopted by developers. Recent work has shown the large potential of large language models (LLMs) in unit test generation, which can generate more human-like and meaningful test code. ChatGPT, the latest LLM incorporating instruction tuning and reinforcement learning, has performed well in various domains. However, It remains unclear how effective ChatGPT is in unit test generation. In this work, we perform the first empirical study to evaluate ChatGPT's capability of unit test generation. Specifically, we conduct a quantitative analysis and a user study to systematically investigate the quality of its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability
