TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation

Zhihao Gong; Zeyu Sun; Dong Huang; Qingyuan Liang; Jie M. Zhang; Dan Hao

arXiv:2508.11468·cs.SE·March 20, 2026

TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation

Zhihao Gong, Zeyu Sun, Dong Huang, Qingyuan Liang, Jie M. Zhang, Dan Hao

PDF

Open Access

TL;DR

This paper introduces TRACE, a benchmark for evaluating execution efficiency in LLM-based code translation, revealing that correctness does not imply efficiency and that inefficiencies are widespread across models and languages.

Contribution

The paper presents TRACE, the first benchmark explicitly designed to assess efficiency in LLM-translated code, and provides a comprehensive evaluation of 28 models highlighting efficiency issues.

Findings

01

Correctness does not reliably indicate efficiency.

02

23.5% of correct translations are inefficient.

03

Inference-time prompt strategies only modestly improve efficiency.

Abstract

While Large Language Models (LLMs) have substantially improved the functional correctness of code translation, the critical dimension of \textit{execution efficiency} remains overlooked. We present \textbf{\textsc{trace}}, the first benchmark to explicitly assess efficiency in LLM-translated code. \textsc{trace} includes 1,000 efficiency-critical tasks across C++, Java, and Python, each augmented with stress tests that reveal efficiency degradations often overlooked by small-scale tests. Using \textsc{trace}, we conduct an extensive evaluation of 28 representative LLMs and highlight several key insights: 1) Correctness is not a reliable proxy for efficiency: the correctness leader \textit{Claude-4-think} achieves only mid-level time efficiency, outperformed by smaller open-source LLMs such as \textit{Qwen2.5-Coder-14B-Instruct}. 2) Inefficiency is both prevalent and patterned: 23.5\% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Software Engineering Research · Software System Performance and Reliability