M2rc-Eval: Massively Multilingual Repository-level Code Completion   Evaluation

Jiaheng Liu; Ken Deng; Congnan Liu; Jian Yang; Shukai Liu; He Zhu,; Peng Zhao; Linzheng Chai; Yanan Wu; Ke Jin; Ge Zhang; Zekun Wang; Guoan; Zhang; Bangyu Xiang; Wenbo Su; Bo Zheng

arXiv:2410.21157·cs.CL·October 29, 2024

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Jiaheng Liu, Ken Deng, Congnan Liu, Jian Yang, Shukai Liu, He Zhu,, Peng Zhao, Linzheng Chai, Yanan Wu, Ke Jin, Ge Zhang, Zekun Wang, Guoan, Zhang, Bangyu Xiang, Wenbo Su, Bo Zheng

PDF

Open Access

TL;DR

This paper introduces M2RC-EVAL, a comprehensive multilingual benchmark with fine-grained annotations for repository-level code completion, and M2RC-INSTRUCT, a dataset to enhance code LLMs' abilities across 18 languages.

Contribution

The paper presents the first massively multilingual benchmark with detailed annotations for code completion, addressing limitations of previous benchmarks focused on fewer languages.

Findings

01

M2RC-EVAL effectively evaluates multilingual code LLMs.

02

M2RC-INSTRUCT improves code completion performance across languages.

03

Benchmark results highlight strengths and gaps in current models.

Abstract

Repository-level code completion has drawn great attention in software engineering, and several benchmark datasets have been introduced. However, existing repository-level code completion benchmarks usually focus on a limited number of languages (<5), which cannot evaluate the general code intelligence abilities across different languages for existing code Large Language Models (LLMs). Besides, the existing benchmarks usually report overall average scores of different languages, where the fine-grained abilities in different completion scenarios are ignored. Therefore, to facilitate the research of code LLMs in multilingual scenarios, we propose a massively multilingual repository-level code completion benchmark covering 18 programming languages (called M2RC-EVAL), and two types of fine-grained annotations (i.e., bucket-level and semantic-level) on different completion scenarios are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Focus