Verifier-Backed Hard Problem Generation for Mathematical Reasoning

Yuhang Lai; Jiazhan Feng; Yee Whye Teh; Ning Miao

arXiv:2605.06660·cs.LG·May 8, 2026

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao

PDF

TL;DR

This paper presents VHG, a novel framework that uses verifier-enhanced self-play to generate valid, challenging, and novel mathematical problems for LLM training, outperforming existing methods.

Contribution

Introduction of VHG, a verifier-backed self-play framework that improves the quality and difficulty of generated problems for mathematical reasoning tasks.

Findings

01

VHG significantly outperforms baseline methods in problem quality.

02

Two verifier variants effectively ensure problem validity and difficulty.

03

Experimental results demonstrate the framework's superiority on integral and reasoning tasks.

Abstract

Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifier-enhanced hard problem generation framework built upon three-party self-play. By integrating an independent verifier into the conventional setter-solver duality, our design constrains the setter's reward to be jointly determined by problem validity (evaluated by the verifier) and difficulty (assessed by the solver). We instantiate two verifier variants: a Hard symbolic verifier and a Soft…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.