ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

Yeheng Chen; Chaoxiang Xie; Yuling Shi; Wenhao Zeng; Yongpan Wang; Hongyu Zhang; Xiaodong Gu

arXiv:2604.26923·cs.SE·April 30, 2026

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

Yeheng Chen, Chaoxiang Xie, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, Xiaodong Gu

PDF

TL;DR

ClassEval-Pro is a comprehensive benchmark with 300 class-level tasks designed to evaluate large language models' ability to generate complete, structured classes from specifications across multiple domains, revealing significant challenges.

Contribution

It introduces a novel, scalable, cross-domain benchmark for class-level code generation, validated by an LLM Judge Ensemble, highlighting current model limitations.

Findings

01

Best model achieves 45.6% Pass@1 on class-level tasks.

02

Structured approaches improve weaker models by up to 9.4%.

03

Logic and dependency errors are the main failure modes.

Abstract

LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i.e., building a complete, internally structured class from a specification -- remains underserved. Current evaluations are either confined to isolated functions or rely on manually curated class-level tasks that are expensive to scale and increasingly susceptible to data contamination. We introduce ClassEval-Pro, a benchmark of 300 class-level tasks spanning 11 domains, constructed through an automated three-stage pipeline that combines complexity enhancement, cross-domain class composition, and integration of real-world GitHub code contributed after January 2025. Every task is validated by an LLM Judge Ensemble and must pass test suites with over 90% line coverage. We evaluate five…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.