On the Evaluation of Neural Code Translation: Taxonomy and Benchmark
Mingsheng Jiao, Tingrui Yu, Xuan Li, Guanjie Qiu, Xiaodong Gu, Beijun, Shen

TL;DR
This paper proposes a taxonomy and new benchmark for neural code translation, revealing current models' strengths and weaknesses across different translation complexities and providing a more comprehensive evaluation framework.
Contribution
It introduces a taxonomy categorizing code translation tasks by complexity and develops G-TransEval, a benchmark for evaluating models on complex, knowledge-dependent translation types.
Findings
State-of-the-art models excel at simple translations but struggle with complex, knowledge-dependent tasks.
Existing benchmarks are biased towards trivial translation tasks.
G-TransEval offers a more comprehensive and fine-grained evaluation of code translation models.
Abstract
In recent years, neural code translation has gained increasing attention. While most of the research focuses on improving model architectures and training processes, we notice that the evaluation process and benchmark for code translation models are severely limited: they primarily treat source code as natural languages and provide a holistic accuracy score while disregarding the full spectrum of model capabilities across different translation types and complexity. In this paper, we present a comprehensive investigation of four state-of-the-art models and analyze in-depth the advantages and limitations of three existing benchmarks. Based on the empirical results, we develop a taxonomy that categorizes code translation tasks into four primary types according to their complexity and knowledge dependence: token level (type 1), syntactic level (type 2), library level (type 3), and algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning in Materials Science · Topic Modeling
