Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Yafu Li; Runzhe Zhan; Haoran Zhang; Shunkai Zhang; Yizhuo Li; Zhilin Wang; Jiacheng Chen; Futing Wang; Xuyang Hu; Yuchen Fan; Bangjie Xu; Yucheng Su; Xinmiao Han; Chenxi Li; Haodi Lei; Yufeng Zhao; Zejin Lin; Qianjia Cheng; Tong Zhu; Xiaoye Qu; Ganqu Cui; Peng Ye; Yun Luo; Zhouchen Lin; Yu Qiao; Bowen Zhou; Ning Ding; Yu Cheng

arXiv:2605.13301·cs.AI·May 14, 2026

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Yafu Li, Runzhe Zhan, Haoran Zhang, Shunkai Zhang, Yizhuo Li, Zhilin Wang, Jiacheng Chen, Futing Wang, Xuyang Hu, Yuchen Fan, Bangjie Xu, Yucheng Su, Xinmiao Han, Chenxi Li, Haodi Lei, Yufeng Zhao, Zejin Lin, Qianjia Cheng, Tong Zhu, Xiaoye Qu, Ganqu Cui, Peng Ye, Yun Luo

PDF

1 Repo 2 Models

TL;DR

This paper presents a unified scaling approach to transform reasoning models into olympiad-level problem solvers, achieving gold-medal performance on IMO and IPhO problems.

Contribution

It introduces a simple, scalable recipe combining curriculum fine-tuning and reinforcement learning to enhance reasoning models for complex scientific and mathematical problems.

Findings

01

Achieved gold-medal-level performance on IMO 2025 and IPhO 2024/2025.

02

Supported stable reasoning on problems with over 100K token trajectories.

03

Demonstrated strong generalization to scientific reasoning beyond math and physics.

Abstract

Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

simplified-reasoning/SU-01
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.