ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training

Shreya Saxena; Siva Prasad; Zishan Ahmad; Vishal Vaddina

arXiv:2507.16478·cs.AI·July 23, 2025

ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training

Shreya Saxena, Siva Prasad, Zishan Ahmad, Vishal Vaddina

PDF

Open Access

TL;DR

ACT is a framework that improves code translation by generating synthetic data and adaptively fine-tuning open-source language models, offering a secure, scalable, and high-performance alternative to proprietary solutions.

Contribution

The paper introduces ACT, an automated pipeline that enhances open-source LLMs for code translation through synthetic data generation and adaptive training management.

Findings

01

Significant performance improvements in code translation accuracy.

02

Enhanced data diversity and functional correctness via synthetic data.

03

Increased developer productivity in industry-scale migration projects.

Abstract

Code translation is a crucial process in software development and migration projects, enabling interoperability between different programming languages and enhancing software adaptability and thus longevity. Traditional automated translation methods rely heavily on handcrafted transformation rules, which often lack flexibility and scalability. Meanwhile, advanced language models present promising alternatives but are often limited by proprietary, API-based implementations that raise concerns over data security and reliance. In this paper, we present Auto-Train for Code Translation (ACT), an innovative framework that aims to improve code translation capabilities by enabling in-house finetuning of open-source Large Language Models (LLMs). ACT's automated pipeline significantly boosts the performance of these models, narrowing the gap between open-source accessibility and the high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques