CODEMENV: Benchmarking Large Language Models on Code Migration

Keyuan Cheng; Xudong Shen; Yihao Yang; Tengyue Wang; Yang Cao; Muhammad Asif Ali; Hanbin Wang; Lijie Hu; Di Wang

arXiv:2506.00894·cs.SE·June 3, 2025

CODEMENV: Benchmarking Large Language Models on Code Migration

Keyuan Cheng, Xudong Shen, Yihao Yang, Tengyue Wang, Yang Cao, Muhammad Asif Ali, Hanbin Wang, Lijie Hu, Di Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CODEMENV, a benchmark for evaluating large language models' ability to perform code migration tasks across different environments, revealing their strengths and limitations in this domain.

Contribution

The work presents a new benchmark dataset, CODEMENV, specifically designed to assess LLMs' capabilities in code migration scenarios across Python and Java.

Findings

01

GPT-4O achieves the highest pass@1 rate of 43.84%.

02

LLMs are better at migrating newer function versions.

03

LLMs sometimes make logical errors unrelated to migration tasks.

Abstract

Large language models (LLMs) have shown remarkable capabilities across various software engineering tasks; however, their effectiveness in code migration, adapting code to run in different environments, remains insufficiently studied. In this work, we introduce CODEMENV: Code Migration Across Environment, a new benchmark specifically designed to assess LLMs' abilities in code migration scenarios. CODEMENV consists of 922 examples spanning 19 Python and Java packages, and covers three core tasks: (1) identifying functions incompatible with specific versions, (2) detecting changes in function definitions, and (3) adapting code to target environments. Experimental evaluation with seven LLMs on CODEMENV yields an average pass@1 rate of 26.50%, with GPT-4O achieving the highest score at 43.84%. Key findings include: (i) LLMs tend to be more proficient with newer function versions, which aids…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xdshen-ai/benchmark-of-code-migration
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification