Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair

Xinyue Liang; Jingxuan Zhang; Lin Li; Jun Zhang; Junhao Chen

arXiv:2605.07403·cs.SE·May 11, 2026

Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair

Xinyue Liang, Jingxuan Zhang, Lin Li, Jun Zhang, Junhao Chen

PDF

TL;DR

This paper presents a multi-stage LLM training framework with error repair techniques to improve Java-to-Cangjie code translation, addressing low-resource challenges and enhancing semantic and structural accuracy.

Contribution

It introduces a novel multi-stage training and error repair approach that effectively translates Java to Cangjie with limited parallel data, improving semantic and structural correctness.

Findings

01

Improves functional equivalence by 6.06% over state-of-the-art methods.

02

Each training stage positively impacts translation performance.

03

Combines compiler feedback and error repair for better code correctness.

Abstract

With the rapid evolution of emerging programming language ecosystems, the demand for code translation to low-resource languages continues to grow. As Cangjie emerges as a new programming language, its ecosystem and development toolchains are rapidly expanding. Automated translation from popular programming languages to Cangjie is therefore valuable for practical development. However, constrained by both insufficient Cangjie knowledge and scarce parallel code corpora, general Large Language Models (LLMs) are prone to syntactic errors and semantic as well as structural misalignment in code translation. Existing approaches typically rely on fine-tuning with large-scale parallel data, but they cannot reliably improve compilability or semantic consistency for low-resource Cangjie languages. To tackle these challenges, we propose a multi-stage training framework of LLMs that employs the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.