Search-Based Multi-Trajectory Refinement for Safe C-to-Rust Translation with Large Language Models
HoHyun Sim, Hyeonjoong Cho, Yeonghyeon Go, Sadegh AlMahdi Kazemi Zarkouei, Zhoulai Fu, Ali Shokri, Binoy Ravindran

TL;DR
This paper introduces LAC2R, a systematic search-based method using MCTS to improve C-to-Rust translation with LLMs, achieving high safety and correctness in real-world benchmarks.
Contribution
It presents a novel MCTS-guided refinement approach for C-to-Rust translation that effectively explores multiple translation trajectories and intermediate steps.
Findings
LAC2R achieves the highest safety ratio and project correctness on small benchmarks.
LAC2R outperforms existing methods in safety and correctness metrics.
The approach effectively handles large-scale real-world C code.
Abstract
The C programming language has been foundational in building system-level software. However, its manual memory management model frequently leads to memory safety issues. In response, Rust has emerged as a memory-safe alternative. Moreover, automating the C-to-Rust translation empowered by the rapid advancements of the generative capabilities of LLMs is gaining growing interest for large volumes of legacy C code. Leveraging LLM for the C-to-Rust translation introduces distinct challenges, unlike the math or commonsense QA domains where the LLMs have been predominantly applied. First, the scarcity of parallel C-to-Rust datasets hinders the retrieval of suitable code translation exemplars for in-context learning. Second, unlike math or commonsense QA problems, the intermediate steps required for C-to-Rust are not well-defined. Third, it remains unclear how to organize and cascade these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
