LLMs for Automated Unit Test Generation and Assessment in Java: The AgoneTest Framework
Andrea Lops, Fedelucio Narducci, Azzurra Ragone, Michelantonio Trizio, Claudio Bartolini

TL;DR
AgoneTest is a standardized framework for evaluating LLM-generated Java unit tests, providing comprehensive metrics and insights that demonstrate the potential of LLMs to match or surpass human testing in coverage and defect detection.
Contribution
It introduces a novel evaluation framework and dataset for assessing LLM-generated Java tests, enabling standardized comparison and analysis of different models and prompting strategies.
Findings
LLM-generated tests can match or exceed human tests in coverage and defect detection.
Enhanced prompting strategies improve test quality.
AgoneTest provides a comprehensive assessment pipeline for LLM-based software testing.
Abstract
Unit testing is an essential but resource-intensive step in software development, ensuring individual code units function correctly. This paper introduces AgoneTest, an automated evaluation framework for Large Language Model-generated (LLM) unit tests in Java. AgoneTest does not aim to propose a novel test generation algorithm; rather, it supports researchers and developers in comparing different LLMs and prompting strategies through a standardized end-to-end evaluation pipeline under realistic conditions. We introduce the Classes2Test dataset, which maps Java classes under test to their corresponding test classes, and a framework that integrates advanced evaluation metrics, such as mutation score and test smells, for a comprehensive assessment. Experimental results show that, for the subset of tests that compile, LLM-generated tests can match or exceed human-written tests in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Model-Driven Software Engineering Techniques · Software Engineering Research
