MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy
Shaoxiong Zhan, Yanlin Lai, Ziyu Lu, Dahua Lin, Ziqing Yang, Fei Tan

TL;DR
MathSmith is a framework that synthesizes extremely challenging mathematical problems from scratch using reinforcement learning, significantly improving the reasoning capabilities of large language models across diverse benchmarks.
Contribution
It introduces a novel problem synthesis method that constructs high-difficulty mathematical problems independently, enhancing data diversity and scalability for training LLMs.
Findings
Outperforms existing baselines on five mathematical reasoning benchmarks.
Effectively generates problems with increased reasoning complexity.
Demonstrates strong scalability and transferability of synthetic data.
Abstract
Large language models have achieved substantial progress in mathematical reasoning, yet their advancement is limited by the scarcity of high-quality, high-difficulty training data. Existing synthesis methods largely rely on transforming human-written templates, limiting both diversity and scalability. We propose MathSmith, a novel framework for synthesizing challenging mathematical problems to enhance LLM reasoning. Rather than modifying existing problems, MathSmith constructs new ones from scratch by randomly sampling concept-explanation pairs from PlanetMath, ensuring data independence and avoiding contamination. To increase difficulty, we design nine predefined strategies as soft constraints during rationales. We further adopts reinforcement learning to jointly optimize structural validity, reasoning complexity, and answer consistency. The length of the reasoning trace generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Jasaxion/MathSmith-hc-Qwen3-8Bmodel· 3 dl3 dl
- 🤗Jasaxion/MathSmith-HC-Problem-Synthesizer-Qwen3-8Bmodel· 27 dl· ♡ 127 dl♡ 1
- 🤗Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8Bmodel· 34 dl· ♡ 134 dl♡ 1
- 🤗Jasaxion/MathSmith-DS-Qwen-7B-LongCoTmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗Jasaxion/MathSmith-Qwen3-8B-LongCoTmodel· 4 dl4 dl
- 🤗Jasaxion/MathSmith-HC-Qwen3-1_7B-ShortCoTmodel
- 🤗Jasaxion/MathSmith-HC-Qwen3-14B-ShortCoTmodel· 4 dl4 dl
- 🤗Jasaxion/MathSmith-HC-Qwen3-32B-ShortCoTmodel· 8 dl8 dl
- Jasaxion/MathSmith-Hard-Problemsdataset· 31 dl31 dl
- Jasaxion/MathSmith-HC-Solution-Generation-LongCoT-Qwen3-30B-A3Bdataset· 21 dl21 dl
- Jasaxion/MathSmith-HC-Solution-Generation-ShortCoT-Qwen3-30B-A3Bdataset· 9 dl9 dl
- Jasaxion/MathSmith-HC-Problemsdataset· 6 dl6 dl
- Jasaxion/MathSmith-Self-Improvement-VarientSetdataset· 18 dl18 dl
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Reinforcement Learning in Robotics
