Unsupervised Translation of Programming Languages
Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume, Lample

TL;DR
This paper introduces an unsupervised neural transcompiler that translates code between programming languages using only monolingual data, outperforming rule-based systems and requiring no language expertise.
Contribution
It presents a novel unsupervised neural approach for transcompilation that leverages monolingual source code, enabling accurate translation without parallel datasets or language-specific knowledge.
Findings
Outperforms rule-based commercial transcompilers
Achieves high accuracy in translating C++, Java, and Python
Requires only monolingual source code for training
Abstract
A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSoftware Engineering Research · Natural Language Processing Techniques · Topic Modeling
