Competitive Programming with Large Reasoning Models

OpenAI: Ahmed El-Kishky; Alexander Wei; Andre Saraiva; Borys Minaiev,; Daniel Selsam; David Dohan; Francis Song; Hunter Lightman; Ignasi Clavera,; Jakub Pachocki; Jerry Tworek; Lorenz Kuhn; Lukasz Kaiser; Mark Chen; Max; Schwarzer; Mostafa Rohaninejad; Nat McAleese; o3 contributors; Oleg M\"urk,; Rhythm Garg; Rui Shu; Szymon Sidor; Vineet Kosaraju; Wenda Zhou

arXiv:2502.06807·cs.LG·February 20, 2025·5 cites

Competitive Programming with Large Reasoning Models

OpenAI: Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaiev,, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera,, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max, Schwarzer, Mostafa Rohaninejad, Nat McAleese

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper demonstrates that large language models enhanced with reinforcement learning outperform domain-specific systems in competitive programming, achieving top results at IOI 2024 without hand-crafted strategies.

Contribution

It shows that scaling general-purpose reinforcement learning models surpasses domain-specific techniques in competitive programming tasks.

Findings

01

o3 model achieves gold at IOI 2024 without hand-crafted heuristics.

02

o3 attains a Codeforces rating comparable to top human competitors.

03

Reinforcement learning scaling is a robust approach for reasoning tasks.

Abstract

We show that reinforcement learning applied to large language models (LLMs) significantly boosts performance on complex coding and reasoning tasks. Additionally, we compare two general-purpose reasoning models - OpenAI o1 and an early checkpoint of o3 - with a domain-specific system, o1-ioi, which uses hand-engineered inference strategies designed for competing in the 2024 International Olympiad in Informatics (IOI). We competed live at IOI 2024 with o1-ioi and, using hand-crafted test-time strategies, placed in the 49th percentile. Under relaxed competition constraints, o1-ioi achieved a gold medal. However, when evaluating later models such as o3, we find that o3 achieves gold without hand-crafted domain-specific strategies or relaxed constraints. Our findings show that although specialized pipelines such as o1-ioi yield solid improvements, the scaled-up, general-purpose o3 model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

open-r1/ioi
dataset· 118 dl
118 dl

Videos

OpenAI: The Age of AI Is Here!· youtube

Taxonomy

TopicsMulti-Criteria Decision Making · Economic theories and models