Measuring the Influence of Incorrect Code on Test Generation
Dong Huang, Jie M. Zhang, Mark Harman, Mingzhe Du, Heming Cui

TL;DR
This study empirically measures how the correctness of code under test influences the effectiveness of large language models in generating tests, revealing significant performance differences and practical implications.
Contribution
It provides the first comprehensive empirical analysis quantifying the impact of code correctness on LLM-based test generation across multiple models and datasets.
Findings
LLMs generate 57% more accurate tests with correct code
Test coverage improves by 12% with correct code
Bug detection increases by 24% when code is correct
Abstract
It is natural to suppose that a Large Language Model is more likely to generate correct test cases when prompted with correct code under test, compared to incorrect code under test. However, the size of this effect has never been previously measured, despite its obvious importance for both practicing software engineers and researchers. To answer the question, we conducted a comprehensive empirical study on 5 open source and 6 closed source language models, with 3 widely-used benchmark data sets together with 41 repo-level real-world examples from two different real-world data sets. Our results reveal that, when compared to incorrect code under test, LLMs prompted with correct code achieve improvements in test accuracy, code coverage, and bug detection of 57\%, 12\%, and 24\% respectively. We further show that these scientific conclusions carry over from the three benchmark data sets to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Reliability and Analysis Research · Real-time simulation and control systems
