ExecRepoBench: Multi-level Executable Code Completion Evaluation
Jian Yang, Jiajun Zhang, Jiaxi Yang, Ke Jin, Lei Zhang, Qiyao Peng,, Ken Deng, Yibo Miao, Tianyu Liu, Zeyu Cui, Binyuan Hui, Junyang Lin

TL;DR
This paper introduces ExecRepoBench, a new multi-level benchmark for evaluating code completion in complex, multi-file Python projects, and demonstrates a fine-tuned open-source model that outperforms existing benchmarks.
Contribution
It presents a novel multi-level, repository-level benchmark and a grammar-based code completion methodology, along with a fine-tuned open-source model, improving real-world code completion performance.
Findings
Qwen2.5-Coder-Instruct-C outperforms prior baselines across languages.
ExecRepoBench provides 1.2K real-world Python samples.
The framework enables more realistic code completion evaluation.
Abstract
Code completion has become an essential tool for daily software development. Existing evaluation benchmarks often employ static methods that do not fully capture the dynamic nature of real-world coding environments and face significant challenges, including limited context length, reliance on superficial evaluation metrics, and potential overfitting to training datasets. In this work, we introduce a novel framework for enhancing code completion in software development through the creation of a repository-level benchmark ExecRepoBench and the instruction corpora Repo-Instruct, aim at improving the functionality of open-source large language models (LLMs) in real-world coding scenarios that involve complex interdependencies across multiple files. ExecRepoBench includes 1.2K samples from active Python repositories. Plus, we present a multi-level grammar-based completion methodology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
Methodstravel james
