MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy

Shaoxiong Zhan; Yanlin Lai; Ziyu Lu; Dahua Lin; Ziqing Yang; Fei Tan

arXiv:2508.05592·cs.CL·March 10, 2026

MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy

Shaoxiong Zhan, Yanlin Lai, Ziyu Lu, Dahua Lin, Ziqing Yang, Fei Tan

PDF

Open Access 8 Models 5 Datasets 1 Video

TL;DR

MathSmith is a framework that synthesizes extremely challenging mathematical problems from scratch using reinforcement learning, significantly improving the reasoning capabilities of large language models across diverse benchmarks.

Contribution

It introduces a novel problem synthesis method that constructs high-difficulty mathematical problems independently, enhancing data diversity and scalability for training LLMs.

Findings

01

Outperforms existing baselines on five mathematical reasoning benchmarks.

02

Effectively generates problems with increased reasoning complexity.

03

Demonstrates strong scalability and transferability of synthetic data.

Abstract

Large language models have achieved substantial progress in mathematical reasoning, yet their advancement is limited by the scarcity of high-quality, high-difficulty training data. Existing synthesis methods largely rely on transforming human-written templates, limiting both diversity and scalability. We propose MathSmith, a novel framework for synthesizing challenging mathematical problems to enhance LLM reasoning. Rather than modifying existing problems, MathSmith constructs new ones from scratch by randomly sampling concept-explanation pairs from PlanetMath, ensuring data independence and avoiding contamination. To increase difficulty, we design nine predefined strategies as soft constraints during rationales. We further adopts reinforcement learning to jointly optimize structural validity, reasoning complexity, and answer consistency. The length of the reasoning trace generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy· underline

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Reinforcement Learning in Robotics