Synthesizing File-Level Data for Unit Test Generation with Chain-of-Thoughts via Self-Debugging
Ziyue Hua, Tianyu Chen, Yeyun Gong, Shuai Lu, Peng Cheng, Qinglin Zhu, Yibo He, Yingjie Fu, Wenpin Jiao, Wei Yang, Tao Xie

TL;DR
This paper introduces a novel self-debugging data distillation method to generate high-quality unit tests with faithful chain-of-thought explanations, significantly improving test generation effectiveness for large language models.
Contribution
It proposes a self-debugging guided test repair and CoT compression approach to create a large high-quality dataset for fine-tuning models for unit test generation.
Findings
Achieved a 36.17% test assertion pass rate.
Attained 43.90% branch coverage.
Reached an 88.66% mutation score.
Abstract
Automatic unit test (UT) generation is essential for software quality assurance, but existing approaches--including symbolic execution, search-based approaches, and recent LLM-based generators--struggle to produce human-quality tests with correct, meaningful assertions and reliable chain-of-thought (CoT) explanations. We identify a gap in UT training data: repository-mined tests lack developer CoTs, while LLM-distilled CoTs are often incorrect or incomplete. To address this issue, we propose a novel data-distillation approach that uses self-debugging to produce high-quality UT training examples paired with faithful CoTs. Our approach combines (1) guided test repair, a heuristic loop (error-, failure-, and coverage-focused steps) that asks the used model to diagnose and iteratively fix generated tests, and (2) CoT compression, which compacts original and debugging CoTs into concise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability
