Can Language Models Solve Olympiad Programming?
Quan Shi, Michael Tang, Karthik Narasimhan, Shunyu Yao

TL;DR
This paper introduces the USACO benchmark to evaluate language models on Olympiad programming problems, revealing current limitations and potential improvements through targeted hints and advanced inference methods.
Contribution
It provides the first comprehensive benchmark for LM performance on Olympiad problems, along with new inference techniques and insights from human-in-the-loop studies.
Findings
GPT-4 achieves 8.7% pass@1 accuracy with zero-shot prompting.
Best inference method improves accuracy to 20.2%.
Targeted hints enable solving most previously unsolvable problems.
Abstract
Computing olympiads contain some of the most challenging problems for humans, requiring complex algorithmic reasoning, puzzle solving, in addition to generating efficient code. However, it has been understudied as a domain to evaluate language models (LMs). In this paper, we introduce the USACO benchmark with 307 problems from the USA Computing Olympiad, along with high-quality unit tests, reference code, and official analyses for each problem. These resources enable us to construct and test a range of LM inference methods for competitive programming for the first time. We find GPT-4 only achieves a 8.7% pass@1 accuracy with zero-shot chain-of-thought prompting, and our best inference method improves it to 20.2% using a combination of self-reflection and retrieval over episodic knowledge. However, this is far from solving the benchmark. To better understand the remaining challenges, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling
MethodsAttention Is All You Need · Dropout · Adam · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Layer · Multi-Head Attention · Byte Pair Encoding · Absolute Position Encodings · Dense Connections
