Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization
Yuhan Wu, Huan Zhang, Wei Cheng, Chen Shen, Jingyue Yang, Wei Hu

TL;DR
This paper introduces CTO, a method that enhances code translation by combining syntax-guided feedback with semantic-aware preference optimization, leading to improved translation accuracy across multiple programming languages.
Contribution
It presents a novel approach that directly derives semantic rewards from source code using contrastive learning, unifying semantic and syntactic feedback in a multi-objective optimization framework.
Findings
CTO significantly outperforms existing baselines in code translation tasks.
The approach improves semantic correctness and syntactic accuracy across C++, Java, and Python.
Contrastive learning effectively assesses functional equivalence between source and translated code.
Abstract
LLMs have shown immense potential for code translation, yet they often struggle to ensure both syntactic correctness and semantic consistency. While preference-based learning offers a promising alignment strategy, it is hindered by unreliable semantic rewards derived from sparse test cases or restrictive reference translations. We argue that a robust semantic reward for code translation must be derived directly from the source code. In this paper, we propose CTO to improve code translation with syntax-guided and semantic-aware preference optimization. Through contrastive learning, we train a cross-lingual semantic model to directly assess functional equivalence between source and translated code. By formulating code translation as a multi-objective optimization problem, this robust semantic signal is seamlessly unified with compiler-based syntactic feedback within the direct preference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
