How do Agents Refactor: An Empirical Study
Lukas Ottenhof, Daniel Penner, Abram Hindle, Thibaud Lutellier

TL;DR
This study empirically analyzes how software development agents perform Java refactoring, revealing that they mainly focus on annotation changes and may introduce more code smells compared to human developers.
Contribution
First empirical comparison of agentic versus developer refactoring in Java, highlighting differences in refactoring types and impact on code quality.
Findings
Agent refactorings mainly involve annotation changes.
Cursor refactoring increases code smells significantly.
Developers perform more diverse structural refactorings.
Abstract
Software development agents such as Claude Code, GitHub Copilot, Cursor Agent, Devin, and OpenAI Codex are being increasingly integrated into developer workflows. While prior work has evaluated agent capabilities for code completion and task automation, there is little work investigating how these agents perform Java refactoring in practice, the types of changes they make, and their impact on code quality. In this study, we present the first analysis of agentic refactoring pull requests in Java, comparing them to developer refactorings across 86 projects per group. Using RefactoringMiner and DesigniteJava 3.0, we identify refactoring types and detect code smells before and after refactoring commits. Our results show that agent refactorings are dominated by annotation changes (the 5 most common refactoring types done by agents are annotation related), in contrast to the diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Software Engineering Methodologies · Model-Driven Software Engineering Techniques
