HumanEval on Latest GPT Models -- 2024

Daniel Li; Lincoln Murr

arXiv:2402.14852·cs.CL·February 26, 2024·2 cites

HumanEval on Latest GPT Models -- 2024

Daniel Li, Lincoln Murr

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the latest GPT-4 models on program synthesis tasks using the HumanEval benchmark, demonstrating significant improvements in zero-shot Python code generation and multi-step problem solving.

Contribution

It introduces a new benchmark with multi-step prompts for GPT models, showing enhanced program synthesis performance over previous single-turn approaches.

Findings

01

GPT-4 achieves state-of-the-art zero-shot performance on HumanEval.

02

Multi-step prompts significantly improve code generation accuracy.

03

Open-source code and datasets facilitate further research.

Abstract

In 2023, we are using the latest models of GPT-4 to advance program synthesis. The large language models have significantly improved the state-of-the-art for this purpose. To make these advancements more accessible, we have created a repository that connects these models to Huamn Eval. This dataset was initally developed to be used with a language model called CODEGEN on natural and programming language data. The utility of these trained models is showcased by demonstrating their competitive performance in zero-shot Python code generation on HumanEval tasks compared to previous state-of-the-art solutions. Additionally, this gives way to developing more multi-step paradigm synthesis. This benchmark features 160 diverse problem sets factorized into multistep prompts that our analysis shows significantly improves program synthesis over single-turn inputs. All code is open source at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daniel442li/gpt-human-eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI)

MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Byte Pair Encoding · Multi-Head Attention · Dense Connections · Label Smoothing · Adam · Softmax