JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating   Large Language Models

Jialun Cao; Zhiyong Chen; Jiarong Wu; Shing-chi Cheung and; Chang Xu

arXiv:2406.12902·cs.LG·October 14, 2024

JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models

Jialun Cao, Zhiyong Chen, Jiarong Wu, Shing-chi Cheung and, Chang Xu

PDF

Open Access 1 Repo

TL;DR

JavaBench is a comprehensive Java benchmark designed to evaluate large language models' ability to generate object-oriented code, addressing gaps in existing benchmarks by focusing on project-level, advanced OOP features.

Contribution

We introduce JavaBench, a new Java benchmark that evaluates LLMs on project-level OOP features, filling gaps in language and granularity of existing benchmarks.

Findings

01

LLMs lag behind undergraduate students in Java project completion.

02

Method signature prompts balance code generation effectiveness.

03

JavaBench is publicly available for research use.

Abstract

Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of benchmarks involve Python, while only 5 benchmarks involve Java. Second, imbalanced code granularity. Function-/statement-level benchmarks account for over 83.3% of benchmarks. Only a mere handful extends to class-/project-levels, and all are limited to Python. Third, lacking advanced features. Existing benchmarks primarily assess basic coding skills, while overlooking advanced Object-Oriented Programming (OOP) features (i.e., encapsulation, inheritance, and polymorphism). To fill these gaps, we propose JavaBench, a project-level Java benchmark that exercises OOP features. It comprises four Java projects with 389 methods in 106 Java classes. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

java-bench/javabench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Software Engineering Research · Model-Driven Software Engineering Techniques