Planning to Explore: Curiosity-Driven Planning for LLM Test Generation

Alfonso Amayuelas; Firas Laakom; Piotr Pi\k{e}kos; Wenyi Wang; Yifan Xu; Yuhui Wang; J\"urgen Schmidhuber; William Wang

arXiv:2604.05159·cs.SE·April 8, 2026

Planning to Explore: Curiosity-Driven Planning for LLM Test Generation

Alfonso Amayuelas, Firas Laakom, Piotr Pi\k{e}kos, Wenyi Wang, Yifan Xu, Yuhui Wang, J\"urgen Schmidhuber, William Wang

PDF

TL;DR

This paper introduces CovQValue, a curiosity-driven planning method for LLM-based code test generation that outperforms greedy strategies by balancing immediate coverage with future reachability, leading to higher branch coverage.

Contribution

It presents a novel Bayesian exploration approach for LLM test generation, incorporating coverage maps and Q-values to improve exploration efficiency.

Findings

01

CovQValue achieves 51-77% higher branch coverage than greedy methods.

02

The approach outperforms existing strategies on TestGenEval Lite across three LLMs.

03

The method demonstrates effective exploration in the new RepoExploreBench benchmark.

Abstract

The use of LLMs for code generation has naturally extended to code testing and evaluation. As codebases grow in size and complexity, so does the need for automated test generation. Current approaches for LLM-based test generation rely on strategies that maximize immediate coverage gain, a greedy approach that plateaus on code where reaching deep branches requires setup steps that individually yield zero new coverage. Drawing on principles of Bayesian exploration, we treat the program's branch structure as an unknown environment, and an evolving coverage map as a proxy probabilistic posterior representing what the LLM has discovered so far. Our method, CovQValue, feeds the coverage map back to the LLM, generates diverse candidate plans in parallel, and selects the most informative plan by LLM-estimated Q-values, seeking actions that balance immediate branch discovery with future…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.