LLM Translation of Compiler Intermediate Representation

Andrea Valenzuela Ramirez; Cristian Gutierrez-Gomez; Marta Barroso; Dario Garcia-Gasulla; Sara Royuela

arXiv:2605.08247·cs.PL·May 12, 2026

LLM Translation of Compiler Intermediate Representation

Andrea Valenzuela Ramirez, Cristian Gutierrez-Gomez, Marta Barroso, Dario Garcia-Gasulla, Sara Royuela

PDF

TL;DR

This paper introduces IRIS-14B, a large language model trained to translate GCC's GIMPLE IR to LLVM IR, significantly improving cross-toolchain interoperability in compiler workflows.

Contribution

First large-scale LLM specifically trained for IR-to-IR translation, outperforming existing models and enabling seamless cross-toolchain integration.

Findings

01

IRIS-14B outperforms state-of-the-art models by up to 44 percentage points.

02

The model is trained on paired IRs from real-world C code and competitive programming problems.

03

Supports hybrid neuro-symbolic compiler architectures for cross-toolchain workflows.

Abstract

GCC and LLVM underpin much of modern software infrastructure, relying on distinct Intermediate Representations (IRs) to drive optimizations and code generation. However, the semantic and structural differences between these IRs create significant barriers for cross-toolchain interaction, limiting the reuse of compiler frontends, backends, and optimization pipelines across programming languages and compilation ecosystems. Traditional rule-based translators have attempted to bridge this gap, but their complexity and maintenance cost have hindered practical adoption. In this context, Large Language Models (LLMs) appear to be an emerging technology that offers a data-driven alternative, capable of learning complex mappings between heterogeneous compiler IRs directly from sufficiently representative examples. To explore this approach, this paper presents IRIS-14B, a 14-billion-parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.