Algorithm-Based Pipeline for Reliable and Intent-Preserving Code Translation with LLMs
Shahriar Rumi Dipto, Saikat Mondal, and Chanchal K. Roy

TL;DR
This paper introduces an algorithm-based pipeline with a language-neutral intermediate step that significantly improves the accuracy and reliability of code translation between Python and Java using LLMs.
Contribution
The study presents a novel structured planning approach that enhances code translation fidelity and reliability over direct translation methods, validated through extensive empirical evaluation.
Findings
Accuracy increased from 67.7% to 78.5%.
Complete elimination of lexical and token errors.
Significant reduction in runtime and structural failures.
Abstract
Code translation, the automatic conversion of programs between languages, is a growing use case for Large Language Models (LLMs). However, direct one-shot translation often fails to preserve program intent, leading to errors in control flow, type handling, and I/O behavior. We propose an algorithm-based pipeline that introduces a language-neutral intermediate specification to capture these details before code generation. This study empirically evaluates the extent to which structured planning can improve translation accuracy and reliability relative to direct translation. We conduct an automated paired experiment - direct and algorithm-based to translate between Python and Java using five widely used LLMs on the Avatar and CodeNet datasets. For each combination (model, dataset, approach, and direction), we compile and execute the translated program and run the tests provided. We record…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Natural Language Processing Techniques · Topic Modeling
