JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
Jialun Cao, Zhiyong Chen, Jiarong Wu, Shing-chi Cheung and, Chang Xu

TL;DR
JavaBench is a comprehensive Java benchmark designed to evaluate large language models' ability to generate object-oriented code, addressing gaps in existing benchmarks by focusing on project-level, advanced OOP features.
Contribution
We introduce JavaBench, a new Java benchmark that evaluates LLMs on project-level OOP features, filling gaps in language and granularity of existing benchmarks.
Findings
LLMs lag behind undergraduate students in Java project completion.
Method signature prompts balance code generation effectiveness.
JavaBench is publicly available for research use.
Abstract
Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of benchmarks involve Python, while only 5 benchmarks involve Java. Second, imbalanced code granularity. Function-/statement-level benchmarks account for over 83.3% of benchmarks. Only a mere handful extends to class-/project-levels, and all are limited to Python. Third, lacking advanced features. Existing benchmarks primarily assess basic coding skills, while overlooking advanced Object-Oriented Programming (OOP) features (i.e., encapsulation, inheritance, and polymorphism). To fill these gaps, we propose JavaBench, a project-level Java benchmark that exercises OOP features. It comprises four Java projects with 389 methods in 106 Java classes. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Model-Driven Software Engineering Techniques
