ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More   Complex Code

Jia Feng; Jiachen Liu; Cuiyun Gao; Chun Yong Chong; Chaozheng Wang,; Shan Gao; Xin Xia

arXiv:2409.10280·cs.SE·September 17, 2024

ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code

Jia Feng, Jiachen Liu, Cuiyun Gao, Chun Yong Chong, Chaozheng Wang,, Shan Gao, Xin Xia

PDF

1 Repo 2 Models

TL;DR

ComplexCodeEval is a comprehensive benchmark that evaluates large code models across diverse real-world programming tasks using extensive, annotated datasets to better reflect practical development challenges.

Contribution

The paper introduces ComplexCodeEval, a new benchmark with diverse tasks and datasets, addressing limitations of prior narrow evaluation methods for large code models.

Findings

01

Context enhances model performance.

02

Data leakage can cause overestimation.

03

Evaluation accuracy is crucial for progress.

Abstract

In recent years, the application of large language models (LLMs) to code-related tasks has gained significant attention. However, existing evaluation benchmarks often focus on limited scenarios, such as code generation or completion, which do not reflect the diverse challenges developers face in real-world contexts. To address this, we introduce ComplexCodeEval, a benchmark designed to assess LCMs in various development tasks, including code generation, completion, API recommendation, and test case generation. It includes 3,897 Java samples and 7,184 Python samples from high-star GitHub repositories, each annotated with function signatures, docstrings, and API references to simulate real development environments. Our experiments across ten LCMs reveal that context improves performance and that data leakage can lead to overestimation, highlighting the need for more accurate evaluations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ComplexCodeEval/ComplexCodeEval
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.