Beyond Accuracy: Behavioral Dynamics of Agentic Multi-Hunk Repair

Noor Nashid; Daniel Ding; Keheliya Gallaba; Ahmed E. Hassan; Ali Mesbah

arXiv:2511.11012·cs.SE·November 17, 2025

Beyond Accuracy: Behavioral Dynamics of Agentic Multi-Hunk Repair

Noor Nashid, Daniel Ding, Keheliya Gallaba, Ahmed E. Hassan, Ali Mesbah

PDF

Open Access

TL;DR

This study systematically evaluates large language model-driven coding agents on multi-hunk bug repair, revealing their strengths, limitations, and the impact of context-aware tools like Maple on repair accuracy.

Contribution

First comprehensive analysis of LLM-driven agents on multi-hunk bug repair, introducing fine-grained metrics and the Maple tool for improved localization and repair performance.

Findings

01

Repair accuracy varies significantly among agents.

02

Higher bug dispersion reduces repair success.

03

Maple improves Gemini-cli's accuracy by 30%.

Abstract

Automated program repair has traditionally focused on single-hunk defects, overlooking multi-hunk bugs that are prevalent in real-world systems. Repairing these bugs requires coordinated edits across multiple, disjoint code regions, posing substantially greater challenges. We present the first systematic study of LLM-driven coding agents (Claude Code, Codex, Gemini-cli, and Qwen Code) on this task. We evaluate these agents on 372 multi-hunk bugs from the Hunk4J dataset, analyzing 1,488 repair trajectories using fine-grained metrics that capture localization, repair accuracy, regression behavior, and operational dynamics. Results reveal substantial variation: repair accuracy ranges from 25.8% (Qwen Code) to 93.3% (Claude Code) and consistently declines with increasing bug dispersion and complexity. High-performing agents demonstrate superior semantic consistency, achieving positive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Security and Verification in Computing