Agentic Bug Reproduction for Effective Automated Program Repair at Google
Runxiang Cheng, Michele Tufano, J\"urgen Cito, Jos\'e Cambronero, Pat, Rondon, Renyao Wei, Aaron Sun, Satish Chandra

TL;DR
This paper presents a novel agent-based approach using a fine-tuned Large Language Model to generate bug reproduction tests from industry bug reports, significantly improving automated debugging and repair at Google.
Contribution
It introduces BRT Agent, an LLM-based method for generating bug reproduction tests, outperforming existing techniques and enhancing automated program repair in a large-scale industrial setting.
Findings
BRT Agent achieves a 28% plausible BRT generation rate, outperforming LIBRO's 10%.
Integrating generated BRTs with APR increases bugs fixed by 30%.
EPR metric effectively selects promising fixes with 70% accuracy in top-1 ranking.
Abstract
Bug reports often lack sufficient detail for developers to reproduce and fix the underlying defects. Bug Reproduction Tests (BRTs), tests that fail when the bug is present and pass when it has been resolved, are crucial for debugging, but they are rarely included in bug reports, both in open-source and in industrial settings. Thus, automatically generating BRTs from bug reports has the potential to accelerate the debugging process and lower time to repair. This paper investigates automated BRT generation within an industry setting, specifically at Google, focusing on the challenges of a large-scale, proprietary codebase and considering real-world industry bugs extracted from Google's internal issue tracker. We adapt and evaluate a state-of-the-art BRT generation technique, LIBRO, and present our agent-based approach, BRT Agent, which makes use of a fine-tuned Large Language Model (LLM)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Web Data Mining and Analysis
