Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models

Shuyang Jiang; Yuhao Wang; Ya Zhang; Yanfeng Wang; Yu Wang

arXiv:2601.04731·cs.AI·May 11, 2026

Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models

Shuyang Jiang, Yuhao Wang, Ya Zhang, Yanfeng Wang, Yu Wang

PDF

1 Repo 2 Models

TL;DR

Miner leverages intrinsic uncertainty as a self-supervised reward to improve data-efficient reinforcement learning in large reasoning models, achieving state-of-the-art results without external supervision.

Contribution

Introduces a simple, effective method that uses intrinsic uncertainty for reward signals, with novel token-level credit assignment and adaptive advantage calibration.

Findings

01

Achieves up to 4.58 absolute gains in Pass@1 over previous methods.

02

Outperforms other exploration-focused algorithms on six reasoning benchmarks.

03

Demonstrates latent uncertainty exploitation is key for scalable RL in reasoning models.

Abstract

Current critic-free RL methods for large reasoning models suffer from severe inefficiency when training on positive homogeneous prompts (where all rollouts are correct), resulting in waste of rollouts due to zero advantage estimates. We introduce a radically simple yet powerful solution to \uline{M}ine \uline{in}trinsic mast\uline{er}y (Miner), that repurposes the policy's intrinsic uncertainty as a self-supervised reward signal, with no external supervision, auxiliary models, or additional inference cost. Our method pioneers two key innovations: (1) a token-level focal credit assignment mechanism that dynamically amplifies gradients on critical uncertain tokens while suppressing overconfident ones, and (2) adaptive advantage calibration to seamlessly integrate intrinsic and verifiable rewards. Evaluated across six reasoning benchmarks on Qwen3-4B and Qwen3-8B base models, Miner…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pixas/Miner
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.