Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Songyang Gao; Yuzhe Gu; Zijian Wu; Lingkai Kong; Wenwei Zhang; Zhongrui Cai; Fan Zheng; Tianyou Ma; Junhao Shen; Haiteng Zhao; Duanyang Zhang; Huilun Zhang; Kuikun Liu; Chengqi Lyu; Yanhui Duan; Chiyu Chen; Ningsheng Ma; Jianfei Gao; Han Lyu; Dahua Lin; Kai Chen

arXiv:2512.10739·cs.CL·December 15, 2025

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Songyang Gao, Yuzhe Gu, Zijian Wu, Lingkai Kong, Wenwei Zhang, Zhongrui Cai, Fan Zheng, Tianyou Ma, Junhao Shen, Haiteng Zhao, Duanyang Zhang, Huilun Zhang, Kuikun Liu, Chengqi Lyu, Yanhui Duan, Chiyu Chen, Ningsheng Ma, Jianfei Gao, Han Lyu, Dahua Lin, Kai Chen

PDF

Open Access

TL;DR

This paper introduces Intern-S1-MO, a hierarchical multi-round reasoning agent with a lemma-based memory system and an RL training framework, enabling it to solve ultra-hard IMO-level math problems beyond previous LRM capabilities.

Contribution

The paper presents a novel multi-agent, multi-round reasoning framework with a lemma memory system and an RL training method, significantly improving LRM performance on complex mathematical problems.

Findings

01

Achieved near-medal scores on IMO2025 non-geometry problems.

02

Surpassed existing LRMs on HMMT2025, AIME2025, and CNMO2025.

03

Reaching gold medal level in CMO2025 with human expert validation.

Abstract

Large Reasoning Models (LRMs) have expanded the mathematical reasoning frontier through Chain-of-Thought (CoT) techniques and Reinforcement Learning with Verifiable Rewards (RLVR), capable of solving AIME-level problems. However, the performance of LRMs is heavily dependent on the extended reasoning context length. For solving ultra-hard problems like those in the International Mathematical Olympiad (IMO), the required reasoning complexity surpasses the space that an LRM can explore in a single round. Previous works attempt to extend the reasoning context of LRMs but remain prompt-based and built upon proprietary models, lacking systematic structures and training pipelines. Therefore, this paper introduces Intern-S1-MO, a long-horizon math agent that conducts multi-round hierarchical reasoning, composed of an LRM-based multi-agent system including reasoning, summary, and verification.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics Education and Teaching Techniques · Constraint Satisfaction and Optimization · Intelligent Tutoring Systems and Adaptive Learning