On the Evaluation of Neural Code Translation: Taxonomy and Benchmark

Mingsheng Jiao; Tingrui Yu; Xuan Li; Guanjie Qiu; Xiaodong Gu; Beijun; Shen

arXiv:2308.08961·cs.SE·August 21, 2023·1 cites

On the Evaluation of Neural Code Translation: Taxonomy and Benchmark

Mingsheng Jiao, Tingrui Yu, Xuan Li, Guanjie Qiu, Xiaodong Gu, Beijun, Shen

PDF

Open Access 1 Repo

TL;DR

This paper proposes a taxonomy and new benchmark for neural code translation, revealing current models' strengths and weaknesses across different translation complexities and providing a more comprehensive evaluation framework.

Contribution

It introduces a taxonomy categorizing code translation tasks by complexity and develops G-TransEval, a benchmark for evaluating models on complex, knowledge-dependent translation types.

Findings

01

State-of-the-art models excel at simple translations but struggle with complex, knowledge-dependent tasks.

02

Existing benchmarks are biased towards trivial translation tasks.

03

G-TransEval offers a more comprehensive and fine-grained evaluation of code translation models.

Abstract

In recent years, neural code translation has gained increasing attention. While most of the research focuses on improving model architectures and training processes, we notice that the evaluation process and benchmark for code translation models are severely limited: they primarily treat source code as natural languages and provide a holistic accuracy score while disregarding the full spectrum of model capabilities across different translation types and complexity. In this paper, we present a comprehensive investigation of four state-of-the-art models and analyze in-depth the advantages and limitations of three existing benchmarks. Based on the empirical results, we develop a taxonomy that categorizes code translation tasks into four primary types according to their complexity and knowledge dependence: token level (type 1), syntactic level (type 2), library level (type 3), and algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

polyeval/g-transeval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Machine Learning in Materials Science · Topic Modeling