Learning to Generate Unit Tests for Automated Debugging
Archiki Prasad, Elias Stengel-Eskin, Justin Chih-Yao Chen, Zaid Khan, Mohit Bansal

TL;DR
This paper introduces UTGen and UTDebug, novel methods for generating and validating unit tests to improve automated debugging and code correctness assessment using large language models.
Contribution
The paper presents UTGen and UTDebug, new techniques for generating error-revealing unit tests and validating their outputs to enhance LLM-based debugging and code evaluation.
Findings
UTGen outperforms baselines by 7.59% in error-revealing test generation.
Using UTGen with UTDebug improves pass@1 accuracy on HumanEvalFix and MBPP+ datasets.
UTGen enhances code correctness judgment, surpassing a state-of-the-art reward model by 4.43%.
Abstract
Unit tests (UTs) play an instrumental role in assessing code correctness as well as providing feedback to large language models (LLMs), motivating automated test generation. However, we uncover a trade-off between generating unit test inputs that reveal errors when given a faulty code and correctly predicting the unit test output without access to the gold solution. To address this trade-off, we propose UTGen, which teaches LLMs to generate unit test inputs that reveal errors along with their correct expected outputs based on task descriptions. Since model-generated tests can provide noisy signals (e.g., from incorrectly predicted outputs), we propose UTDebug that (i) scales UTGen via test-time compute to improve UT output prediction, and (ii) validates and backtracks edits based on multiple generated UTs to avoid overfitting, and helps LLMs debug effectively. We show that UTGen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Real-time simulation and control systems · Software Testing and Debugging Techniques
