RepoMasterEval: Evaluating Code Completion via Real-World Repositories

Qinyun Wu; Chao Peng; Pengfei Gao; Ruida Hu; Haoyu Gan; Bo Jiang; Jinhe Tang; Zhiwen Deng; Zhanming Guan; Cuiyun Gao; Xia Liu; Ping Yang

arXiv:2408.03519·cs.SE·November 3, 2025

RepoMasterEval: Evaluating Code Completion via Real-World Repositories

Qinyun Wu, Chao Peng, Pengfei Gao, Ruida Hu, Haoyu Gan, Bo Jiang, Jinhe Tang, Zhiwen Deng, Zhanming Guan, Cuiyun Gao, Xia Liu, Ping Yang

PDF

Open Access

TL;DR

RepoMasterEval is a new benchmark for evaluating code completion models using real-world repository data, emphasizing practical scenarios and test effectiveness to better reflect real development environments.

Contribution

It introduces a novel benchmark constructed from real repositories, incorporating mutation testing and manual test case creation to improve evaluation accuracy.

Findings

01

Test argumentation significantly improves model accuracy.

02

RepoMasterEval reports performance variance in real-world scenarios.

03

The benchmark correlates well with practical model performance.

Abstract

With the growing reliance on automated code completion tools in software development, the need for comprehensive evaluation benchmarks has become critical. Existing benchmarks focus more on code completion in function and class level by providing text descriptions to prompt the model. By contrast, such descriptive prompt is commonly unavailable in real development and code completion can occur in wider range of situations such as in the middle of a function or a code block. These limitations makes existing evaluation benchmarks poorly align with the practical scenarios of code completion tools. In this paper, we propose RepoMasterEval, a novel benchmark for evaluating code completion models constructed from real-world repositories. Each benchmark datum is generated by masking a code snippet (ground truth) from one source code file with existing test suites. To improve test accuracy of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Natural Language Processing Techniques · Scientific Computing and Data Management

MethodsFocus · ALIGN