PECC: Problem Extraction and Coding Challenges

Patrick Haller; Jonas Golde; Alan Akbik

arXiv:2404.18766·cs.AI·April 30, 2024

PECC: Problem Extraction and Coding Challenges

Patrick Haller, Jonas Golde, Alan Akbik

PDF

Open Access 1 Repo 1 Datasets

TL;DR

PECC is a new benchmark that tests large language models' ability to interpret narrative problems, extract requirements, and generate code, revealing current limitations especially on math-based challenges.

Contribution

We introduce PECC, a benchmark derived from AoC and Project Euler, to evaluate LLMs' problem understanding and code generation in narrative contexts.

Findings

01

GPT-3.5-Turbo passes 50% on AoC challenges

02

Performance drops to 8% on Euler math problems

03

Models struggle with natural language prompts and complex problems

Abstract

Recent advancements in large language models (LLMs) have showcased their exceptional abilities across various tasks, such as code generation, problem-solving and reasoning. Existing benchmarks evaluate tasks in isolation, yet the extent to which LLMs can understand prose-style tasks, identify the underlying problems, and then generate appropriate code solutions is still unexplored. Addressing this gap, we introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler, including 2396 problems. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code. A key feature of our dataset is the complexity added by natural language prompting in chat-based evaluations, mirroring real-world instruction ambiguities. Results show varying model performance between narrative and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hallerpatrick/pecc
noneOfficial

Datasets

PatrickHaller/pecc
dataset· 31 dl
31 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Adam · Layer Normalization · Attention Dropout · Multi-Head Attention · Cosine Annealing · Dropout