Otter: Generating Tests from Issues to Validate SWE Patches
Toufique Ahmed, Jatin Ganhotra, Rangeet Pan, Avraham Shinnar, Saurabh Sinha, Martin Hirzel

TL;DR
Otter is an LLM-based system that generates validation tests from software issues, supporting test-driven development and validation of code patches, with improved accuracy over existing methods.
Contribution
The paper introduces Otter, a novel LLM-based approach augmented with rule-based analysis and self-reflection for generating tests from issues, advancing the state-of-the-art in this area.
Findings
Otter outperforms existing systems in test generation from issues.
Otter enhances patch generation systems with better test validation.
The approach supports TDD and improves software robustness.
Abstract
While there has been plenty of work on generating tests from existing code, there has been limited work on generating tests from issues. A correct test must validate the code patch that resolves the issue. This paper focuses on the scenario where that code patch does not yet exist. Doing so supports two major use-cases. First, it supports TDD (test-driven development), the discipline of "test first, write code later" that has well-documented benefits for human software engineers. Second, it also validates SWE (software engineering) agents, which generate code patches for resolving issues. This paper introduces TDD-Bench-Verified, a benchmark for generating tests from issues, and Otter, an LLM-based solution for this task. Otter augments LLMs with rule-based analysis to check and repair their outputs, and introduces a novel self-reflective action planner. Experiments show Otter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInnovations in Concrete and Construction Materials · Microplastics and Plastic Pollution
MethodsFocus
