CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation
Zhanhang Xiong, Dongxia Wang, Yuekang Li, Xinyuan An, Wenhai Wang

TL;DR
CrossPL is a new benchmark for evaluating large language models' ability to generate code that enables interoperability across multiple programming languages, highlighting current models' limitations in this complex task.
Contribution
We introduce CrossPL, the first comprehensive benchmark for assessing LLMs' performance on cross-programming-language code generation involving IPC mechanisms.
Findings
State-of-the-art LLMs perform poorly on CPL tasks.
Existing models struggle with multi-language code generation.
Benchmark reveals significant gaps in current LLM capabilities.
Abstract
As large language models (LLMs) become increasingly embedded in software engineering workflows, a critical capability remains underexplored: generating correct code that enables cross-programming-language (CPL) interoperability. This skill is essential for building complex systems that integrate components written in multiple languages via mechanisms like inter-process communication (IPC). To bridge this gap, we present CrossPL, the first benchmark designed to systematically evaluate LLMs' ability to generate CPL-interoperating code. CrossPL comprises 1,982 tasks centered around IPC, covering six widely-used programming languages and seven representative CPL techniques. We construct this benchmark by (i) analyzing 19,169 multi-language GitHub repositories using 156 hand-crafted finite state machines (FSMs), and (ii) developing an LLM-based pipeline that automatically extracts CPL code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
