How well LLM-based test generation techniques perform with newer LLM versions?
Michael Konstantinou, Renzo Degiovanni, Mike Papadakis

TL;DR
This study evaluates the performance of newer LLMs in automated test generation, showing that plain LLM approaches outperform previous specialized techniques in effectiveness and can be optimized for efficiency.
Contribution
It demonstrates that modern LLMs alone can surpass state-of-the-art test generation methods and proposes an efficient, targeted approach to reduce query costs.
Findings
Plain LLMs outperform previous methods in coverage and mutation score
Targeting classes first reduces LLM query costs by 20%
Newer LLMs make specialized engineering components less necessary
Abstract
The rapid evolution of Large Language Models (LLMs) has strongly impacted software engineering, leading to a growing number of studies on automated unit test generation. However, the standalone use of LLMs without post-processing has proven insufficient, often producing tests that fail to compile or achieve high coverage. Several techniques have been proposed to address these issues, reporting improvements in test compilation and coverage. While important, LLM-based test generation techniques have been evaluated against relatively weak baselines (for todays' standards), i.e., old LLM versions and relatively weak prompts, which may exacerbate the performance contribution of the approaches. In other words, stronger (newer) LLMs may obviate any advantage these techniques bring. We investigate this issue by replicating four state-of-the-art LLM-based test generation tools, HITS, SymPrompt,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Topic Modeling · Natural Language Processing Techniques
