Quality and Security Signals in AI-Generated Python Refactoring Pull Requests
Mohamed Almukhtar, Anwar Ghammam, Hua Ming

TL;DR
This study empirically evaluates AI-generated Python refactoring pull requests, analyzing their impact on code quality, security, and maintainability, revealing both benefits and challenges in real-world development workflows.
Contribution
It provides the first comprehensive empirical analysis of AI-driven refactoring PRs, including a taxonomy of change operations and their effects on quality and security.
Findings
Agentic commits improve a quality attribute in 22.5% of changes
24.17% of modified files introduce new Pylint issues
73.5% of PRs are merged despite some introducing new issues
Abstract
As AI agents increasingly contribute to code development and maintenance, there is still limited empirical evidence on the quality and risk characteristics of their changes in real-world projects, particularly for refactoring-oriented contributions. It remains unclear how agent-authored refactoring edits affect maintainability, code quality, and security once merged into GitHub repositories. To address this gap, we conduct an empirical study of Python refactoring pull requests (PRs) from the AIDev dataset. We analyze agentic refactoring PRs using PyQu, an ML-based quality assessment tool for Python, to quantify changes across five quality attributes, and we complement PyQu with domain-independent static analysis (Pylint and Bandit) to measure code quality and security issues before and after each change. Our results show that, on average, agentic commits improve a quality attribute in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
