CodeBenchGen: Creating Scalable Execution-based Code Generation   Benchmarks

Yiqing Xie; Alex Xie; Divyanshu Sheth; Pengfei Liu; Daniel Fried,; Carolyn Rose

arXiv:2404.00566·cs.SE·October 4, 2024·1 cites

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried,, Carolyn Rose

PDF

Open Access 2 Repos

TL;DR

CodeBenchGen is a framework that creates scalable, execution-based code generation benchmarks from real-world code, enabling more comprehensive evaluation of code generation systems across diverse scenarios.

Contribution

It introduces a novel method to generate execution-based benchmarks from natural code sources using large language models, expanding evaluation capabilities.

Findings

01

Created the Exec-CSN dataset with 1,931 examples from GitHub repositories.

02

81.3% of examples are solvable by humans, indicating practical relevance.

03

Conducted code generation experiments demonstrating the framework's utility.

Abstract

To adequately test modern code generation systems, evaluation benchmarks must execute and test the code generated by the system. However, these execution and testing requirements have largely limited benchmarks to settings where code is easily executable or has human-written tests. To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks from naturally occurring code sources. Specifically, we leverage a large language model (LLM) to sandbox arbitrary pieces of code into evaluation examples, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples involving 293 libraries converted from code in 367 GitHub repositories taken from the Code- SearchNet dataset. To demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Software Testing and Debugging Techniques