From Human to Machine Refactoring: Assessing GPT-4's Impact on Python Class Quality and Readability
Alessandro Midolo, Emiliano Tramontana, Massimiliano Di Penta

TL;DR
This study empirically evaluates GPT-4o's ability to perform automated refactoring on Python classes, demonstrating improvements in code quality and correctness but noting a decrease in readability, thus informing future LLM-based refactoring tools.
Contribution
It provides a comprehensive empirical assessment of GPT-4o for class-level refactoring, exploring its effects on correctness, quality, and readability across a benchmark dataset.
Findings
GPT-4o produces behavior-preserving refactorings
Refactorings reduce code smells and improve quality metrics
Readability tends to decrease after refactoring
Abstract
Refactoring is a software engineering practice that aims to improve code quality without altering program behavior. Although automated refactoring tools have been extensively studied, their practical applicability remains limited. Recent advances in Large Language Models (LLMs) have introduced new opportunities for automated code refactoring. The evaluation of such an LLM-driven approach, however, leaves unanswered questions about its effects on code quality. In this paper, we present a comprehensive empirical study on LLM-driven refactoring using GPT-4o, applied to 100 Python classes from the ClassEval benchmark. Unlike prior work, our study explores a wide range of class-level refactorings inspired by Fowler's catalog and evaluates their effects from three complementary perspectives: (i) behavioral correctness, verified through unit tests; (ii) code quality, assessed via Pylint,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Scientific Computing and Data Management
