From Restructuring to Stabilization: A Large-Scale Experiment on Iterative Code Readability Refactoring with Large Language Models
Norman Peitek, Julia Hess, Sven Apel

TL;DR
This study systematically evaluates GPT5.1's ability to improve Java code readability through iterative refactoring, revealing convergence patterns and robustness across different code variants, thus providing empirical insights into LLM-assisted code improvement.
Contribution
It presents a large-scale experiment analyzing the iterative refactoring process of LLMs, highlighting convergence behavior and robustness in code readability enhancement.
Findings
Iterative refactoring shows initial restructuring followed by stabilization.
Convergence patterns are consistent across code variants.
Explicit prompts influence refactoring dynamics slightly.
Abstract
Large language models (LLMs) are increasingly used for automated code refactoring tasks. Although these models can quickly refactor code, the quality may exhibit inconsistencies and unpredictable behavior. In this article, we systematically study the capabilities of LLMs for code refactoring with a specific focus on improving code readability. We conducted a large-scale experiment using GPT5.1 with 230 Java snippets, each systematically varied and refactored regarding code readability across five iterations under three different prompting strategies. We categorized fine-grained code changes during the refactoring into implementation, syntactic, and comment-level transformations. Subsequently, we investigated the functional correctness and tested the robustness of the results with novel snippets. Our results reveal three main insights: First, iterative code refactoring exhibits an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Topic Modeling
