Can Language Models Solve Olympiad Programming?

Quan Shi; Michael Tang; Karthik Narasimhan; Shunyu Yao

arXiv:2404.10952·cs.CL·April 18, 2024·1 cites

Can Language Models Solve Olympiad Programming?

Quan Shi, Michael Tang, Karthik Narasimhan, Shunyu Yao

PDF

Open Access 1 Repo

TL;DR

This paper introduces the USACO benchmark to evaluate language models on Olympiad programming problems, revealing current limitations and potential improvements through targeted hints and advanced inference methods.

Contribution

It provides the first comprehensive benchmark for LM performance on Olympiad problems, along with new inference techniques and insights from human-in-the-loop studies.

Findings

01

GPT-4 achieves 8.7% pass@1 accuracy with zero-shot prompting.

02

Best inference method improves accuracy to 20.2%.

03

Targeted hints enable solving most previously unsolvable problems.

Abstract

Computing olympiads contain some of the most challenging problems for humans, requiring complex algorithmic reasoning, puzzle solving, in addition to generating efficient code. However, it has been understudied as a domain to evaluate language models (LMs). In this paper, we introduce the USACO benchmark with 307 problems from the USA Computing Olympiad, along with high-quality unit tests, reference code, and official analyses for each problem. These resources enable us to construct and test a range of LM inference methods for competitive programming for the first time. We find GPT-4 only achieves a 8.7% pass@1 accuracy with zero-shot chain-of-thought prompting, and our best inference method improves it to 20.2% using a combination of self-reflection and retrieval over episodic knowledge. However, this is far from solving the benchmark. To better understand the remaining challenges, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princeton-nlp/USACO
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling

MethodsAttention Is All You Need · Dropout · Adam · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Layer · Multi-Head Attention · Byte Pair Encoding · Absolute Position Encodings · Dense Connections