Otter: Generating Tests from Issues to Validate SWE Patches

Toufique Ahmed; Jatin Ganhotra; Rangeet Pan; Avraham Shinnar; Saurabh Sinha; Martin Hirzel

arXiv:2502.05368·cs.SE·June 2, 2025

Otter: Generating Tests from Issues to Validate SWE Patches

Toufique Ahmed, Jatin Ganhotra, Rangeet Pan, Avraham Shinnar, Saurabh Sinha, Martin Hirzel

PDF

Open Access 1 Repo 1 Video

TL;DR

Otter is an LLM-based system that generates validation tests from software issues, supporting test-driven development and validation of code patches, with improved accuracy over existing methods.

Contribution

The paper introduces Otter, a novel LLM-based approach augmented with rule-based analysis and self-reflection for generating tests from issues, advancing the state-of-the-art in this area.

Findings

01

Otter outperforms existing systems in test generation from issues.

02

Otter enhances patch generation systems with better test validation.

03

The approach supports TDD and improves software robustness.

Abstract

While there has been plenty of work on generating tests from existing code, there has been limited work on generating tests from issues. A correct test must validate the code patch that resolves the issue. This paper focuses on the scenario where that code patch does not yet exist. Doing so supports two major use-cases. First, it supports TDD (test-driven development), the discipline of "test first, write code later" that has well-documented benefits for human software engineers. Second, it also validates SWE (software engineering) agents, which generate code patches for resolving issues. This paper introduces TDD-Bench-Verified, a benchmark for generating tests from issues, and Otter, an LLM-based solution for this task. Otter augments LLMs with rule-based analysis to check and repair their outputs, and introduces a novel self-reflective action planner. Experiments show Otter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/TDD-Bench-Verified
noneOfficial

Videos

Otter: Generating Tests from Issues to Validate SWE Patches· slideslive

Taxonomy

TopicsInnovations in Concrete and Construction Materials · Microplastics and Plastic Pollution

MethodsFocus