Evaluating Agent-based Program Repair at Google

Pat Rondon; Renyao Wei; Jos\'e Cambronero; J\"urgen Cito; Aaron Sun,; Siddhant Sanyam; Michele Tufano; Satish Chandra

arXiv:2501.07531·cs.SE·January 14, 2025

Evaluating Agent-based Program Repair at Google

Pat Rondon, Renyao Wei, Jos\'e Cambronero, J\"urgen Cito, Aaron Sun,, Siddhant Sanyam, Michele Tufano, Satish Chandra

PDF

TL;DR

This paper evaluates the effectiveness of agent-based program repair in an industrial setting at Google, establishing a baseline performance on a new, diverse bug dataset from Google's issue tracking system.

Contribution

It introduces a new evaluation dataset from Google and demonstrates the performance of an agentic repair approach, Passerine, in an enterprise context.

Findings

01

Passerine repairs 73% of machine-reported bugs plausibly.

02

Passerine repairs 25.6% of human-reported bugs plausibly.

03

Approximately 43% of machine-reported bugs have semantically equivalent patches.

Abstract

Agent-based program repair offers to automatically resolve complex bugs end-to-end by combining the planning, tool use, and code generation abilities of modern LLMs. Recent work has explored the use of agent-based repair approaches on the popular open-source SWE-Bench, a collection of bugs from highly-rated GitHub Python projects. In addition, various agentic approaches such as SWE-Agent have been proposed to solve bugs in this benchmark. This paper explores the viability of using an agentic approach to address bugs in an enterprise context. To investigate this, we curate an evaluation set of 178 bugs drawn from Google's issue tracking system. This dataset spans both human-reported (78) and machine-reported bugs (100). To establish a repair performance baseline on this benchmark, we implement Passerine, an agent similar in spirit to SWE-Agent that can work within Google's development…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training