Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline

Yichen Huang; Lin F. Yang

arXiv:2507.15855·cs.AI·October 1, 2025

Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline

Yichen Huang, Lin F. Yang

PDF

TL;DR

The paper introduces a model-agnostic verification-and-refinement pipeline that significantly improves the accuracy of large language models on IMO 2025 problems, demonstrating a new approach to complex mathematical reasoning.

Contribution

It presents a novel, model-agnostic pipeline that enhances reasoning accuracy on high-level math problems, outperforming baseline model performances.

Findings

01

Achieved approximately 85.7% accuracy on IMO 2025 problems using the pipeline.

02

Baseline models had significantly lower accuracies, e.g., 31.6%, 21.4%, and 38.1%.

03

Pipeline effectively leverages existing models without requiring retraining.

Abstract

The International Mathematical Olympiad (IMO) is widely regarded as the world championship of high-school mathematics. IMO problems are renowned for their difficulty and novelty, demanding deep insight, creativity, and rigor. Although large language models perform well on many mathematical benchmarks, they often struggle with Olympiad-level problems. Using carefully designed prompts, we construct a model-agnostic, verification-and-refinement pipeline. We demonstrate its effectiveness on the recent IMO 2025, avoiding data contamination for models released before the competition. Equipped with any of the three leading models -- Gemini 2.5 Pro, Grok-4, or GPT-5 -- our pipeline correctly solved 5 out of the 6 problems ( $\approx$ 85.7% accuracy). This is in sharp contrast to their baseline accuracies: 31.6% (Gemini 2.5 Pro), 21.4% (Grok-4), and 38.1% (GPT-5), obtained by selecting the best of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.