ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests

Shiyi Xu; Yiwen Hu; Yingqian Min; Zhipeng Chen; Wayne Xin Zhao; Ji-Rong Wen

arXiv:2506.04894·cs.CL·June 6, 2025

ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests

Shiyi Xu, Yiwen Hu, Yingqian Min, Zhipeng Chen, Wayne Xin Zhao, Ji-Rong Wen

PDF

Open Access 1 Repo

TL;DR

ICPC-Eval is a new benchmark for evaluating large language models' reasoning abilities in competitive programming, featuring realistic problems, robust evaluation tools, and a novel metric to better assess iterative problem-solving skills.

Contribution

The paper introduces ICPC-Eval, a comprehensive benchmark with realistic contest problems, a local evaluation toolkit, and a new metric for assessing reasoning in LLMs, addressing limitations of existing benchmarks.

Findings

01

Top-tier models rely on multi-turn feedback for reasoning.

02

Models still lag behind human performance in complex coding tasks.

03

ICPC-Eval reveals the challenges in evaluating reasoning abilities.

Abstract

With the significant progress of large reasoning models in complex coding and reasoning tasks, existing benchmarks, like LiveCodeBench and CodeElo, are insufficient to evaluate the coding capabilities of large language models (LLMs) in real competition environments. Moreover, current evaluation metrics such as Pass@K fail to capture the reflective abilities of reasoning models. To address these challenges, we propose \textbf{ICPC-Eval}, a top-level competitive coding benchmark designed to probing the frontiers of LLM reasoning. ICPC-Eval includes 118 carefully curated problems from 11 recent ICPC contests held in various regions of the world, offering three key contributions: 1) A challenging realistic ICPC competition scenario, featuring a problem type and difficulty distribution consistent with actual contests. 2) A robust test case generation method and a corresponding local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RUCAIBox/Slow_Thinking_with_LLMs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Mobile Crowdsensing and Crowdsourcing