Large Language Models for Multilingual Code Intelligence: A Survey

Chao Jiang; Dugang Liu; Cheng Wen; Zhiwu Xu; Hua Zheng; Muhammad Sadiq; Jawwad Ahmed Shamsi; Shengchao Qin; and Zhong Ming

arXiv:2604.25960·cs.SE·April 30, 2026

Large Language Models for Multilingual Code Intelligence: A Survey

Chao Jiang, Dugang Liu, Cheng Wen, Zhiwu Xu, Hua Zheng, Muhammad Sadiq, Jawwad Ahmed Shamsi, Shengchao Qin, and Zhong Ming

PDF

TL;DR

This survey reviews the use of large language models in multilingual code intelligence, emphasizing challenges in less-resourced languages and the importance of semantic preservation across languages.

Contribution

It provides a comprehensive overview of methods, benchmarks, and challenges in multilingual code generation and translation using large language models.

Findings

01

Current models perform well on high-resource languages like Python.

02

Multilingual code translation aims to preserve semantics across languages.

03

Challenges include bias toward high-resource languages and ensuring trustworthy generalization.

Abstract

Large language models have transformed AI-assisted software engineering, but current research remains biased toward high-resource languages such as Python, with weaker performance in languages like Rust and OCaml. Since real-world systems are inherently polyglot, robust multilingual code intelligence is crucial. This survey focuses on two key tasks: multilingual code generation from shared natural-language requirements, and multilingual code translation that preserves semantics across languages. It reviews representative methods, benchmarks, and evaluation metrics, and highlights challenges and opportunities for trustworthy cross-language generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.