JMigBench: A Benchmark for Evaluating LLMs on Source Code Migration (Java 8 to Java 11)

Nishil Amin; Zhiwei Fei; Xiang Li; Justyna Petke; He Ye

arXiv:2602.09930·cs.SE·February 11, 2026

JMigBench: A Benchmark for Evaluating LLMs on Source Code Migration (Java 8 to Java 11)

Nishil Amin, Zhiwei Fei, Xiang Li, Justyna Petke, He Ye

PDF

Open Access

TL;DR

This paper introduces JMigBench, a benchmark dataset for evaluating large language models on Java source code migration tasks from Java 8 to Java 11, highlighting current capabilities and limitations.

Contribution

It provides a curated dataset and evaluation framework for assessing LLMs on Java code migration, focusing on API updates and migration correctness.

Findings

01

Mistral Codestral handles trivial API substitutions with moderate success.

02

Identical migrations achieved in 11.11% of cases.

03

Model struggles with complex migrations like CORBA or JAX-WS.

Abstract

We build a benchmark to evaluate large language models (LLMs) for source code migration tasks, specifically upgrading functions from Java 8 to Java 11. We first collected a dataset of function pairs from open-source repositories, but limitations in data quality led us to construct a refined dataset covering eight categories of deprecated APIs. Using this dataset, the Mistral Codestral model was evaluated with CodeBLEU and keyword-based metrics to measure lexical and semantic similarity as well as migration correctness. Results show that the evaluated model (Mistral Codestral) can handle trivial one-to-one API substitutions with moderate success, achieving identical migrations in 11.11% of the cases, but it struggles with more complex migrations such as CORBA or JAX-WS. These findings suggest Mistral Codestral can partially reduce developer effort by automating repetitive migration tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Logic, programming, and type systems