AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits
Yunbo Lyu, Jieke Shi, Hong Jin Kang, Ratnadira Widyasari, Junda He, Yuqing Niu, Chengran Yang, Junkai Chen, Zhou Yang, Julia Lawall, David Lo

TL;DR
AgentSZZ introduces an LLM-driven agent framework with interactive reasoning and domain tools to improve bug-inducing commit identification, outperforming existing methods especially in complex scenarios.
Contribution
It presents a novel agent-based approach that integrates tools and domain knowledge with iterative reasoning to enhance bug traceability in software repositories.
Findings
AgentSZZ achieves up to 27.2% higher F1-score than prior LLM-based approaches.
Recall improvements of up to 300% in challenging cross-file and ghost commit scenarios.
Structured compression reduces token usage by over 30% without significant accuracy loss.
Abstract
The SZZ algorithm is the dominant technique for identifying bug-inducing commits and underpins many software engineering tasks, such as defect prediction and vulnerability analysis. Despite numerous variants, including recent LLM-based approaches, performance remains limited on developer-annotated datasets (e.g., recall of 0.552 on the Linux kernel). A key limitation is the reliance on git blame, which traces line-level changes within the same file, failing in common scenarios such as ghost and cross-file cases-making nearly one-quarter of bug-inducing commits inherently untraceable. Moreover, current approaches follow fixed pipelines that restrict iterative reasoning and exploration, unlike developers who investigate bugs through an interactive, multi-tool process. To address these challenges, we propose AgentSZZ, an agent-based framework that leverages LLM-driven agents to explore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
