ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training
Shreya Saxena, Siva Prasad, Zishan Ahmad, Vishal Vaddina

TL;DR
ACT is a framework that improves code translation by generating synthetic data and adaptively fine-tuning open-source language models, offering a secure, scalable, and high-performance alternative to proprietary solutions.
Contribution
The paper introduces ACT, an automated pipeline that enhances open-source LLMs for code translation through synthetic data generation and adaptive training management.
Findings
Significant performance improvements in code translation accuracy.
Enhanced data diversity and functional correctness via synthetic data.
Increased developer productivity in industry-scale migration projects.
Abstract
Code translation is a crucial process in software development and migration projects, enabling interoperability between different programming languages and enhancing software adaptability and thus longevity. Traditional automated translation methods rely heavily on handcrafted transformation rules, which often lack flexibility and scalability. Meanwhile, advanced language models present promising alternatives but are often limited by proprietary, API-based implementations that raise concerns over data security and reliance. In this paper, we present Auto-Train for Code Translation (ACT), an innovative framework that aims to improve code translation capabilities by enabling in-house finetuning of open-source Large Language Models (LLMs). ACT's automated pipeline significantly boosts the performance of these models, narrowing the gap between open-source accessibility and the high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
