Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

Yun Qu; Qi Wang; Yixiu Mao; Heming Zou; Yuhang Jiang; Weijie Liu; Clive Bai; Kai Yang; Yangkun Chen; Saiyong Yang; Xiangyang Ji

arXiv:2602.01970·cs.AI·May 18, 2026

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

Yun Qu, Qi Wang, Yixiu Mao, Heming Zou, Yuhang Jiang, Weijie Liu, Clive Bai, Kai Yang, Yangkun Chen, Saiyong Yang, Xiangyang Ji

PDF

TL;DR

This paper proposes GPS, a lightweight, generalizable predictive model for efficient prompt selection in reinforcement learning of large language models, improving training and inference efficiency.

Contribution

Introduction of GPS, a Bayesian inference-based prompt selection method that generalizes across prompts and incorporates diversity, reducing computational costs in RL training of language models.

Findings

01

GPS significantly improves training efficiency over baselines.

02

GPS enhances final reasoning performance.

03

GPS reduces test-time computational costs.

Abstract

Reinforcement learning enhances the reasoning capabilities of large language models but often involves high computational costs due to rollout-intensive optimization. Online prompt selection presents a plausible solution by prioritizing informative prompts to improve training efficiency. However, current methods either depend on costly, exact evaluations or construct prompt-specific predictive models lacking generalization across prompts. This study introduces Generalizable Predictive Prompt Selection (GPS), which performs Bayesian inference towards prompt difficulty using a lightweight generative model trained on the shared optimization history. Intermediate-difficulty prioritization and history-anchored diversity are incorporated into the batch acquisition principle to select informative prompt batches. The small predictive model also generalizes at test-time for efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics