Bootstrapping Code Translation with Weighted Multilanguage Exploration
Yuhan Wu, Huan Zhang, Wei Cheng, Chen Shen, Jingyue Yang, Wei Hu

TL;DR
BootTrans is a novel bootstrapping approach that enhances multilingual code translation by leveraging test suites as universal verification tools and dynamically balancing training across language pairs.
Contribution
The paper introduces BootTrans, a method combining test suite adaptation, dual-pool data expansion, and language-aware weighting to improve multilingual code translation.
Findings
Significant performance improvements over baseline LLMs across benchmarks.
Effective use of test suites as universal verification oracles.
Validation of bootstrapping and weighting components through ablation studies.
Abstract
Code translation across multiple programming languages is essential yet challenging due to two vital obstacles: scarcity of parallel data paired with executable test oracles, and optimization imbalance when handling diverse language pairs. We propose BootTrans, a bootstrapping method that resolves both obstacles. Its key idea is to leverage the functional invariance and cross-lingual portability of test suites, adapting abundant pivot-language unit tests to serve as universal verification oracles for multilingual reinforcement learning (RL) training. Our method introduces a dual-pool architecture with seed and exploration pools to progressively expand training data via execution-guided experience collection. Furthermore, we design a language-aware weighting mechanism that dynamically prioritizes harder translation directions based on relative performance across sibling languages,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
