SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning

Xiong Jun Wu; Zhenduo Zhang; ZuJie Wen; Zhiqiang Zhang; Wang Ren; Lei Shi; Cai Chen; Deng Zhao; Qing Wang; Xudong Han; Chengfu Tang; Dingnan Jin; Qing Cui; Jun Zhou

arXiv:2505.14147·cs.AI·May 27, 2025

SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning

Xiong Jun Wu, Zhenduo Zhang, ZuJie Wen, Zhiqiang Zhang, Wang Ren, Lei Shi, Cai Chen, Deng Zhao, Qing Wang, Xudong Han, Chengfu Tang, Dingnan Jin, Qing Cui, Jun Zhou

PDF

Open Access

TL;DR

SHARP is a comprehensive method for generating high-quality, verifiable reasoning problems to enhance large reasoning models' training, significantly improving their complex reasoning abilities in STEM domains.

Contribution

We introduce SHARP, a novel framework combining strategic principles and a structured process to synthesize challenging, verifiable STEM reasoning problems for reinforcement learning.

Findings

01

SHARP-generated problems improve reasoning accuracy on benchmarks.

02

Models trained with SHARP data outperform existing methods.

03

Enhanced reasoning performance approaches expert levels.

Abstract

Training large reasoning models (LRMs) with reinforcement learning in STEM domains is hindered by the scarcity of high-quality, diverse, and verifiable problem sets. Existing synthesis methods, such as Chain-of-Thought prompting, often generate oversimplified or uncheckable data, limiting model advancement on complex tasks. To address these challenges, we introduce SHARP, a unified approach to Synthesizing High-quality Aligned Reasoning Problems for LRMs reinforcement learning with verifiable rewards (RLVR). SHARP encompasses a strategic set of self-alignment principles -- targeting graduate and Olympiad-level difficulty, rigorous logical consistency, and unambiguous, verifiable answers -- and a structured three-phase framework (Alignment, Instantiation, Inference) that ensures thematic diversity and fine-grained control over problem generation. We implement SHARP by leveraging a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Machine Learning and Data Classification

MethodsSparse Evolutionary Training