Automated Extract Method Refactoring with Open-Source LLMs: A Comparative Study

Sivajeet Chand; Melih Kilic; Roland W\"ursching; Sushant Kumar Pandey; Alexander Pretschner

arXiv:2510.26480·cs.SE·October 31, 2025

Automated Extract Method Refactoring with Open-Source LLMs: A Comparative Study

Sivajeet Chand, Melih Kilic, Roland W\"ursching, Sushant Kumar Pandey, Alexander Pretschner

PDF

TL;DR

This study evaluates open-source large language models for automating Extract Method Refactoring in Python, demonstrating that recursive prompting improves code quality and acceptance over traditional methods.

Contribution

It provides a systematic comparison of LLMs with recursive prompting for automated refactoring, highlighting the effectiveness of RCI strategies and establishing a benchmark for future research.

Findings

01

RCI prompting outperforms one-shot prompting in test pass rates.

02

Deepseek-Coder-RCI and Qwen2.5-Coder-RCI achieve high test pass percentages.

03

Over 70% developer acceptance for RCI-generated refactorings.

Abstract

Automating the Extract Method refactoring (EMR) remains challenging and largely manual despite its importance in improving code readability and maintainability. Recent advances in open-source, resource-efficient Large Language Models (LLMs) offer promising new approaches for automating such high-level tasks. In this work, we critically evaluate five state-of-the-art open-source LLMs, spanning 3B to 8B parameter sizes, on the EMR task for Python code. We systematically assess functional correctness and code quality using automated metrics and investigate the impact of prompting strategies by comparing one-shot prompting to a Recursive criticism and improvement (RCI) approach. RCI-based prompting consistently outperforms one-shot prompting in test pass rates and refactoring quality. The best-performing models, Deepseek-Coder-RCI and Qwen2.5-Coder-RCI, achieve test pass percentage (TPP)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.