Scalable, Validated Code Translation of Entire Projects using Large Language Models
Hanliang Zhang, Cristina David, Meng Wang, Brandon Paulsen, Daniel, Kroening

TL;DR
This paper presents a scalable, validated approach for translating entire large codebases using large language models by partitioning code, applying feature mapping, and ensuring semantic correctness, significantly improving translation success rates.
Contribution
The authors introduce a modular translation method with feature mapping and type-compatibility checks, enabling reliable translation of large codebases with LLMs, surpassing previous scalability limitations.
Findings
Successfully translated up to 6,600 lines of Go code to Rust.
Achieved an average of 73% functions validated for I/O equivalence.
Outperformed existing methods in translation success rate.
Abstract
Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. However, a significant limitation when using LLMs for code translation is scalability: existing works have shown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation by developing a modular approach to translation, where we partition the code into small code fragments which can be translated independently and semantically validated (that is, checking I/O equivalence). When this approach is applied naively, we discover that LLMs are unreliable when translating features of the source language that do not have a direct mapping to the target language, and that the LLM often gets stuck in repair loops when attempting to fix errors. To address these issues, we introduce two key concepts: (1) feature mapping, which integrates predefined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Model-Driven Software Engineering Techniques · Topic Modeling
