Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation

Qiming Zhu; Jialun Cao; Xuanang Chen; Weili Zhang; Yaojie Lu; Hongyu Lin; Xianpei Han; Le Sun; Shing-Chi Cheung

arXiv:2506.03535·cs.SE·April 21, 2026

Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation

Qiming Zhu, Jialun Cao, Xuanang Chen, Weili Zhang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, Shing-Chi Cheung

PDF

1 Repo

TL;DR

This paper investigates the effectiveness of retrieval-augmented code generation across multiple programming languages, revealing challenges and factors influencing cross-lingual knowledge transfer.

Contribution

It constructs a multilingual dataset and systematically studies cross-lingual transfer, providing insights for designing better multilingual RACG systems.

Findings

01

Knowledge transfer across languages is complex and not straightforward.

02

Transfer effectiveness depends on linguistic similarity and training data diversity.

03

Limited reliance on natural language cues when using code-specific retrieval.

Abstract

Current research on large language models (LLMs) with retrieval-augmented code generation (RACG) has largely focused on single-language settings, leaving their cross-lingual effectiveness underexplored. Multilingual RACG systems are increasingly important for migrating and reusing code across programming languages (PLs), a common yet challenging task in modern software development. To systematically study cross-lingual code knowledge transfer in RACG, we construct a dataset covering 13 PLs with nearly 14K instances. Our experiments reveal three key insights: (1) Knowledge transfer in RACG across PLs is non-trivial even using direct injection. (2) RACG exhibits unequal cross-lingual knowledge transfer, and its efficacy depends on linguistic affinity of PL pair and diversity of LLM pretraining corpus. (3) RACG shows limited reliance on natural language information embedded in code when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

icip-cas/Cross-Lingual-RACG
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.