OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics

Yaoming Zhu; Junxin Wang; Yiyang Li; Lin Qiu; ZongYu Wang; Jun Xu; Xuezhi Cao; Yuhuai Wei; Mingshi Wang; Xunliang Cai; Rong Ma

arXiv:2506.10481·cs.AI·June 13, 2025

OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics

Yaoming Zhu, Junxin Wang, Yiyang Li, Lin Qiu, ZongYu Wang, Jun Xu, Xuezhi Cao, Yuhuai Wei, Mingshi Wang, Xunliang Cai, Rong Ma

PDF

Open Access 2 Datasets

TL;DR

OIBench is a new challenging informatics benchmark with 250 problems designed to evaluate and advance reasoning capabilities of models, revealing current strengths and gaps in AI performance on complex algorithmic tasks.

Contribution

The paper introduces OIBench, a comprehensive, contamination-resistant olympiad-level dataset for benchmarking AI reasoning, along with novel analysis tools and human-model comparison methods.

Findings

01

Current SOTA models outperform most humans in correctness and efficiency.

02

Open-source models lag behind closed-source counterparts.

03

Models are still suboptimal compared to canonical solutions.

Abstract

As models become increasingly sophisticated, conventional algorithm benchmarks are increasingly saturated, underscoring the need for more challenging benchmarks to guide future improvements in algorithmic reasoning. This paper introduces OIBench, a high-quality, private, and challenging olympiad-level informatics dataset comprising 250 carefully curated original problems. We detail the construction methodology of the benchmark, ensuring a comprehensive assessment across various programming paradigms and complexities, and we demonstrate its contamination-resistant properties via experiments. We propose Time/Space Completion Curves for finer-grained efficiency analysis and enable direct human-model comparisons through high-level participant evaluations. Our experiments reveal that while open-source models lag behind closed-source counterparts, current SOTA models already outperform most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Advanced Graph Neural Networks