How well LLM-based test generation techniques perform with newer LLM versions?

Michael Konstantinou; Renzo Degiovanni; Mike Papadakis

arXiv:2601.09695·cs.SE·January 15, 2026

How well LLM-based test generation techniques perform with newer LLM versions?

Michael Konstantinou, Renzo Degiovanni, Mike Papadakis

PDF

Open Access

TL;DR

This study evaluates the performance of newer LLMs in automated test generation, showing that plain LLM approaches outperform previous specialized techniques in effectiveness and can be optimized for efficiency.

Contribution

It demonstrates that modern LLMs alone can surpass state-of-the-art test generation methods and proposes an efficient, targeted approach to reduce query costs.

Findings

01

Plain LLMs outperform previous methods in coverage and mutation score

02

Targeting classes first reduces LLM query costs by 20%

03

Newer LLMs make specialized engineering components less necessary

Abstract

The rapid evolution of Large Language Models (LLMs) has strongly impacted software engineering, leading to a growing number of studies on automated unit test generation. However, the standalone use of LLMs without post-processing has proven insufficient, often producing tests that fail to compile or achieve high coverage. Several techniques have been proposed to address these issues, reporting improvements in test compilation and coverage. While important, LLM-based test generation techniques have been evaluated against relatively weak baselines (for todays' standards), i.e., old LLM versions and relatively weak prompts, which may exacerbate the performance contribution of the approaches. In other words, stronger (newer) LLMs may obviate any advantage these techniques bring. We investigate this issue by replicating four state-of-the-art LLM-based test generation tools, HITS, SymPrompt,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Topic Modeling · Natural Language Processing Techniques