QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, Aviral Kumar

TL;DR
QED-Nano is a small, open 4-billion-parameter model trained with a novel multi-stage process to perform at Olympiad-level math proofs, rivaling larger proprietary models.
Contribution
The paper introduces a new training pipeline for small models that achieves competitive mathematical reasoning performance on complex proofs.
Findings
QED-Nano surpasses larger open models like Nomos-1 and GPT-OSS-120B.
QED-Nano approaches the performance of proprietary models such as Gemini 3 Pro.
The training pipeline includes supervised fine-tuning, reinforcement learning, and reasoning cache expansion.
Abstract
Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" models and scaffolds makes them expensive to run, difficult to reproduce, and hard to study or improve upon. This raises a central question: can small, open models also be trained to achieve competitive reasoning performance on difficult Olympiad-level math? In this paper, we answer this question by building QED-Nano, a 4B model post-trained for Olympiad-level proofs. Our training recipe has three stages: (1) supervised fine-tuning to imbue good proof-writing styles by distilling from DeepSeek-Math-V2, (2) reinforcement learning (RL) with rubric-based rewards, and (3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
