Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
Jiahao Wu, Ning Lu, Shengcai Liu, Kun Wang, Yanting Yang, Li Qing, Ke Tang

TL;DR
This paper introduces HIVE, a novel prompt selection framework that improves data efficiency in RL training of large language models by focusing on high-utility prompts at the learning edge, reducing computational costs.
Contribution
The paper proposes HIVE, a dual-stage prompt selection method that leverages historical rewards and prompt entropy to enhance RL training efficiency for large reasoning models.
Findings
HIVE significantly reduces rollout costs without performance loss.
Prompt utility concentrates at the learning edge, shifting during training.
HIVE outperforms baseline methods across multiple benchmarks.
Abstract
Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible gradients and are thus of low utility. To address this problem, we investigate how to select high-utility prompts before the rollout phase. Our experimental analysis reveals that sample utility is non-uniform and evolving: the strongest learning signals concentrate at the ``learning edge", the intersection of intermediate difficulty and high uncertainty, which shifts as training proceeds. Motivated by this, we propose HIVE (History-Informed and online-VErified prompt selection), a dual-stage framework for data-efficient RL. HIVE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
