To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study
Shota Sawada, Tatsuya Shirai, Yutaro Kashiwa, Ken'ichi Yamaguchi, Hiroshi Iwata, Hajimu Iida

TL;DR
This empirical study compares the maintenance patterns of AI-generated code to human-written code, revealing that AI code requires less frequent updates and is mainly modified for feature additions.
Contribution
It provides the first large-scale empirical analysis of maintenance behaviors for AI-generated code versus human-authored code.
Findings
AI-generated files receive less frequent maintenance than human code.
Modifications to AI code mainly involve feature extensions.
Humans perform the majority of maintenance on AI-generated code.
Abstract
LLM-based autonomous coding agents have reshaped software development. While these agents excel at code generation, open questions persist about the long-term maintainability of AI-generated code. This study empirically investigates the maintenance extent, human involvement, and modification types of AI-generated files versus human-authored code. Using the AIDev dataset of AI-generated pull requests and GitHub, we analyzed over 1,000 files and approximately 3,200 changes from 100 popular repositories. Our findings show that: (i) AI-generated files receive less frequent maintenance than human-authored code, with updates affecting only a small fraction of file size; (ii) the most frequent modifications to AI code are feature extensions, whereas human updates focus on bug fixes, and (iii) human developers perform the large majority of this maintenance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
