Specification-Driven Code Translation Powered by Large Language Models: How Far Are We?
Soumit Kanti Saha, Fazle Rabbi, Song Wang, Jinqiu Yang

TL;DR
This paper investigates the effectiveness of using natural language specifications as an intermediate step in code translation with large language models, finding limited overall benefits but some improvements with specific language pairs.
Contribution
It provides a comprehensive evaluation of NL-specification as an intermediate representation in code translation across multiple datasets and language pairs, highlighting its conditional effectiveness.
Findings
NL-specification alone does not improve performance.
Combining NL-specification with source code benefits certain language pairs.
No consistent overall improvement observed.
Abstract
Large Language Models (LLMs) are increasingly being applied across various domains, including code-related tasks such as code translation. Previous studies have explored using LLMs for translating code between different programming languages. Since LLMs are more effective with natural language, using natural language as an intermediate representation in code translation tasks is an intuitively appealing approach. However, whether this benefit is general or highly context-dependent remains unclear. In this work, we investigate using NL-specification as an intermediate representation for code translation. We evaluate our method using three datasets, five popular programming languages, and 29 language pair permutations. Our results show that using NL-specification alone does not lead to performance improvements. However, when combined with source code, it provides gains in certain language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
