Safer Builders, Risky Maintainers: A Comparative Study of Breaking Changes in Human vs Agentic PRs
K M Ferdous, Dipayan Banik, Kowshik Chowdhury, Shazibul Islam Shamim

TL;DR
This study compares the frequency and context of breaking changes in AI-generated versus human-authored pull requests in Python open-source projects, revealing lower overall risk but higher risk during maintenance tasks.
Contribution
It provides a comprehensive analysis of breaking change patterns in AI versus human PRs, introducing a tool for AST-based detection and highlighting the 'Confidence Trap' phenomenon.
Findings
AI agents introduce fewer breaking changes overall than humans (3.45% vs. 7.40%).
Higher risk of breaking changes during maintenance tasks like refactoring and chores.
Highly confident AI PRs can still cause breaking changes, indicating need for stricter review.
Abstract
AI coding agents are increasingly integrated into modern software engineering workflows, actively collaborating with human developers to create pull requests (PRs) in open-source repositories. Although coding agents improve developer productivity, they often generate code with more bugs and security issues than human-authored code. While human-authored PRs often break backward compatibility, leading to breaking changes, the potential for agentic PRs to introduce breaking changes remains underexplored. The goal of this paper is to help developers and researchers evaluate the reliability of AI-generated PRs by examining the frequency and task contexts in which AI agents introduce breaking changes. We conduct a comparative analysis of 7,191 agent-generated PRs with 1402 human-authored PRs from Python repositories in the AIDev dataset. We develop a tool that analyzes code changes in commits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
