Hallucination to Consensus: Multi-Agent LLMs for End-to-End JUnit Test Generation
Qinghua Xu, Guancheng Wang, Lionel Briand, Kui Liu

TL;DR
CANDOR is a multi-agent LLM framework that improves automated Java unit test generation by reducing hallucinations, enhancing oracle correctness, and outperforming existing methods in code coverage and mutation score.
Contribution
This work introduces CANDOR, a novel prompt engineering-based multi-agent LLM system that enhances test generation accuracy and efficiency without fine-tuning or external tools.
Findings
CANDOR achieves comparable code coverage to EvoSuite.
CANDOR significantly outperforms SOTA oracle generator TOGLL in correctness.
Ablation studies highlight the importance of key agents in test quality.
Abstract
Unit testing plays a critical role in ensuring software correctness. However, writing unit tests manually is labor-intensive, especially for strongly typed languages like Java, motivating the need for automated approaches. Traditional methods primarily rely on search-based or randomized algorithms to achieve high code coverage and produce regression oracles, which are derived from the program's current behavior rather than its intended functionality. Recent advances in LLMs have enabled oracle generation from natural language descriptions, aligning better with user requirements. However, existing LLM-based methods often require fine-tuning or rely on external tools such as EvoSuite for test prefix generation, making them costly or cumbersome to apply in practice. In this work, we propose CANDOR, a novel prompt engineering-based LLM framework for automated unit test generation in Java.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
