REMODEL-LLM: Transforming C code to Java using LLMs
Aryan Gupta, Y. Raghu Reddy

TL;DR
This study evaluates 19 small LLMs for translating C code to Java, revealing significant performance gaps and identifying only a few models capable of producing partially correct code, with limitations on complex features.
Contribution
It introduces a hybrid pipeline using ASTs and rule-based prompting to assess the translation capabilities of small, quantized LLMs for C to Java conversion.
Findings
Most models failed to generate basic Java boilerplate.
A few models produced runnable code but with semantic errors.
Top models struggled with complex C features like function pointers.
Abstract
The automated translation of C code to Java code is a notoriously difficult task, fraught with challenges stemming from fundamental paradigm shifts (procedural vs. Object Oriented), memory models (manual pointers vs. Garbage Collection), and incompatible data types. This paper investigates the efficacy of 19 small, quantized LLMs (under 20 billion parameters) for the C to Java translation task. We use a novel, hybrid pipeline that leverages Abstract Syntax Trees (ASTs) for semantic decomposition and employs a highly constrained, rule based prompting strategy. The results are stark: a clear multi tiered performance divide emerged. The vast majority of models (Tier 3, e.g., llama3.1, gemma3, starcoder2) failed 100\% of the tests, proving incapable of generating even basic, runnable Java boilerplate. A small middle tier (Tier 2, e.g., mistral-nemo and mistral) produced runnable code but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, programming, and type systems · Software Engineering Research · Natural Language Processing Techniques
