Reproduction Test Generation for Java SWE Issues
Toufique Ahmed, Jatin Ganhotra, Avraham Shinnar, Martin Hirzel

TL;DR
This paper introduces a benchmark and a solution for generating reproduction tests for Java software issues, addressing a gap in existing research focused mainly on Python.
Contribution
It presents TDD-Bench-Java, the first benchmark for Java reproduction test generation, and adapts a Python-based generator to Java with high performance.
Findings
TDD-Bench-Java includes 250 instances from open-source repositories.
e-Otter++ achieves high performance on Java reproduction test generation.
Results on proprietary data demonstrate industry relevance.
Abstract
Given an issue on a software repository, a reproduction test confirms its presence in the code before it gets fixed and its absence after. Reproduction tests provide crucial execution-based feedback for diagnosis and validation during software development. Unfortunately, they are usually missing. Therefore, recent work has introduced both benchmarks and a thriving literature on solutions for reproduction test generation from issues. However, that work has focused on Python and neglected other languages such as Java, which is important for enterprise software. This paper introduces both a benchmark and a solution for Java repository-level reproduction test generation. The benchmark, TDD-Bench-Java, is the first to model this problem and comprises 250 instances sourced from popular open-source repositories. The solution, e-Otter++ for Java, adapts a state-of-the-art reproduction test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
