Unsupervised Translation of Programming Languages

Marie-Anne Lachaux; Baptiste Roziere; Lowik Chanussot; Guillaume; Lample

arXiv:2006.03511·cs.CL·September 23, 2020·62 cites

Unsupervised Translation of Programming Languages

Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume, Lample

PDF

Open Access 5 Repos 1 Datasets 3 Videos

TL;DR

This paper introduces an unsupervised neural transcompiler that translates code between programming languages using only monolingual data, outperforming rule-based systems and requiring no language expertise.

Contribution

It presents a novel unsupervised neural approach for transcompilation that leverages monolingual source code, enabling accurate translation without parallel datasets or language-specific knowledge.

Findings

01

Outperforms rule-based commercial transcompilers

02

Achieves high accuracy in translating C++, Java, and Python

03

Requires only monolingual source code for training

Abstract

A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

gabeorlanski/bc-transcoder
dataset· 38 dl
38 dl

Videos

TransCoder: Unsupervised Translation of Programming Languages (Paper Explained)· youtube

A brief history of the Transformer architecture in NLP· youtube

Unsupervised Translation of Programming Languages· slideslive

Taxonomy

TopicsSoftware Engineering Research · Natural Language Processing Techniques · Topic Modeling