Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach   Leveraging Large Language Models

Zejun Zhang; Zhenchang Xing; Xiaoxue Ren; Qinghua Lu; Xiwei Xu

arXiv:2406.03660·cs.SE·June 7, 2024

Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach Leveraging Large Language Models

Zejun Zhang, Zhenchang Xing, Xiaoxue Ren, Qinghua Lu, Xiwei Xu

PDF

1 Repo

TL;DR

This paper introduces RIdiom, a hybrid knowledge-driven approach combining rules and large language models to accurately detect and refactor Python code into idiomatic forms, outperforming existing methods.

Contribution

It presents a novel hybrid framework integrating rule-based and LLM techniques for Pythonic idiom refactoring, addressing limitations of previous approaches.

Findings

01

RIdiom achieves over 90% accuracy and F1-score on nine established idioms.

02

The approach outperforms Prompt-LLM in all evaluated metrics.

03

It maintains high precision while significantly improving recall and F1-score.

Abstract

Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptability of LLMs, we propose a hybrid approach consisting of three modules. We not only write prompts to instruct LLMs to complete tasks, but we also invoke Analytic Rule Interfaces (ARIs) to accomplish tasks. The ARIs are Python code generated by prompting LLMs to generate code. We first construct a knowledge module with three elements including ASTscenario, ASTcomponent and Condition, and prompt LLMs to generate Python code for incorporation into an ARI library for subsequent use. After that,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idiomaticrefactoring/idiomatizationllm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.