Heterogeneous Prompting and Execution Feedback for SWE Issue Test Generation and Selection
Toufique Ahmed, Jatin Ganhotra, Avraham Shinnar, Martin Hirzel

TL;DR
This paper presents e-Otter++, a novel approach that leverages execution feedback to automatically generate reproduction tests for software engineering issues, significantly improving test generation success rates.
Contribution
It introduces innovative techniques for using execution feedback in test generation when code is missing or incorrect, advancing the state-of-the-art in automated test creation for SWE issues.
Findings
e-Otter++ achieves an average fail-to-pass rate of 63% on TDD-Bench Verified.
The approach effectively handles missing or incorrect code in test generation.
Experimental results demonstrate substantial improvements over existing methods.
Abstract
A software engineering issue (SWE issue) is easier to resolve when accompanied by a reproduction test. Unfortunately, most issues do not come with functioning reproduction tests, so this paper explores how to generate them automatically. The primary challenge in this setting is that the code to be tested is either missing or wrong, as evidenced by the existence of the issue in the first place. This has held back test generation for this setting: without the correct code to execute, it is difficult to leverage execution feedback to generate good tests. This paper introduces novel techniques for leveraging execution feedback to get around this problem, implemented in a new reproduction test generator called e-Otter++. Experiments show that e-Otter++ represents a leap ahead in the state-of-the-art for this problem, generating tests with an average fail-to-pass rate of 63% on the TDD-Bench…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Reliability and Analysis Research · Simulation Techniques and Applications
