When is Generated Code Difficult to Comprehend? Assessing AI Agent Python Code Proficiency in the Wild
Nanthit Temkulkiat, Chaiyong Ragkhitwetsagul, Morakot Choetkiertikul, Ruksit Rojpaisarnkit, Raula Gaikovina Kula

TL;DR
This study assesses the proficiency level of AI-generated Python code, revealing that most code is basic but some tasks require advanced skills for effective review and maintenance.
Contribution
It introduces a static analysis approach to quantify AI-generated code proficiency, highlighting the skill levels needed for developers to effectively review such code.
Findings
Over 90% of AI-generated code is at basic proficiency levels (A1, A2).
AI-generated code proficiency is similar to human code in pull requests.
High-proficiency AI code mainly appears in feature addition and bug fixing tasks.
Abstract
The rapid adoption of AI coding agents is fundamentally shifting software developers' roles from code authors to code reviewers. While developers spend a significant portion of their time reading and comprehending code, the linguistic proficiency and complexity of the Python code generated by these agents remain largely unexplored. This study investigates the code proficiency of AI agents to determine the skill level required for developers to maintain their code. Leveraging the AIDev dataset, we mined 591 pull requests containing 5,027 Python files generated by three distinct AI agents and employed pycefr, a static analysis tool that maps Python constructs to six proficiency levels, ranging from A1 (Basic) to C2 (Mastery), to analyze the code. Our results reveal that: AI agents predominantly generate Basic-level code, with over 90% of constructs falling into the A1 and A2 categories,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
