Large Language Models for Unit Test Generation: Achievements, Challenges, and Opportunities
Bei Chu, Yang Feng, Kui Liu, Zhaoqiang Guo, Yichi Zhang, Hange Shi, Zifan Nan, Baowen Xu

TL;DR
This paper reviews the use of large language models for automated unit test generation, highlighting current achievements, challenges like fault detection, and future opportunities for autonomous and hybrid testing systems.
Contribution
It provides a systematic review of 115 studies, proposes a taxonomy of the process, and identifies key trends such as prompt engineering and iterative validation in LLM-based testing.
Findings
Prompt engineering dominates 89% of studies.
Iterative validation improves test robustness.
Challenges include weak fault detection and lack of benchmarks.
Abstract
Automated unit test generation is critical for software quality but traditional structure-driven methods often lack the semantic understanding required to produce realistic inputs and oracles. Large language models (LLMs) address this limitation by leveraging their extensive data-driven knowledge of code semantics and programming patterns. To analyze the state of the art in this domain, we conducted a systematic literature review of 115 publications published between May 2021 and August 2025. We propose a taxonomy based on the unit test generation lifecycle that divides the process into a generative phase for creating test artifacts and a quality assurance phase for refining them. Our analysis reveals that prompt engineering has emerged as the dominant utilization approach and accounts for 89% of the studies due to its flexibility. We find that iterative validation and repair loops have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Software Engineering Research
