Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
Mart\'in Rodr\'iguez, Gustavo Rossi, and Alejandro Fernandez

TL;DR
This paper assesses the capability of Large Language Models to automatically generate unit tests focusing on equivalence partitions and boundary values, highlighting their potential and current limitations compared to manual testing.
Contribution
It introduces an optimized prompt design for LLMs to generate critical test cases and compares their performance with human programmers using both quantitative and qualitative analyses.
Findings
LLMs' effectiveness depends on prompt quality and implementation
Manual supervision remains essential for reliable test generation
LLMs show promise but need further refinement for autonomous testing
Abstract
The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
