MigrationBench: Repository-Level Code Migration Benchmark from Java 8
Linbo Liu, Xinle Liu, Qiang Zhou, Lin Chen, Yihan Liu, Hoan Nguyen, Behrooz Omidvar-Tehrani, Xi Shen, Jun Huan, Omer Tripp, Anoop Deoras

TL;DR
MigrationBench is a comprehensive dataset and evaluation framework for assessing large language models on repository-level code migration from Java 8 to newer LTS versions, facilitating research in automated code modernization.
Contribution
The paper introduces MigrationBench, a new benchmark dataset and evaluation framework specifically for Java code migration tasks, including a novel SD-Feedback approach for LLMs.
Findings
SD-Feedback achieves over 62% success rate with Claude-3.5-Sonnet-v2.
Benchmark includes 5,102 repositories for comprehensive evaluation.
Provides a versatile resource for research in code migration.
Abstract
With the rapid advancement of powerful large language models (LLMs) in recent years, a wide range of software engineering tasks can now be addressed using LLMs, significantly enhancing productivity and scalability. Numerous benchmark datasets have been developed to evaluate the coding capabilities of these models, while they primarily focus on code generation and issue-resolution tasks. In contrast, we introduce a new coding benchmark MigrationBench with a distinct focus: code migration. MigrationBench aims to serve as a comprehensive benchmark for migration from Java to the latest long-term support (LTS) versions (Java , ), including a full dataset and its subset selected with and repositories respectively. Selected is a representative subset curated for complexity and difficulty, offering a versatile resource to support research in the field of code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Model-Driven Software Engineering Techniques · Scientific Computing and Data Management
