Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?
Yun Qu, Qi Wang, Yixiu Mao, Vincent Tao Hu, Bj\"orn Ommer, Xiangyang Ji

TL;DR
This paper introduces MoPPS, a Bayesian framework for online prompt difficulty prediction that accelerates reinforcement learning finetuning of reasoning models by reducing costly LLM interactions.
Contribution
It presents MoPPS, a novel Bayesian risk-predictive method for online prompt difficulty estimation that improves training efficiency without extensive LLM evaluations.
Findings
MoPPS accurately predicts prompt difficulty across tasks.
It significantly reduces the number of LLM rollouts needed.
Training acceleration is achieved without sacrificing performance.
Abstract
Recent advances have witnessed the effectiveness of reinforcement learning (RL) finetuning in enhancing the reasoning capabilities of large language models (LLMs). The optimization process often requires numerous iterations to achieve satisfactory performance, resulting in high computational costs due to the need for frequent prompt evaluations under intensive LLM interactions and repeated policy updates. Appropriate online prompt selection methods reduce iteration steps by prioritizing informative prompts during training, while the pipeline's reliance on exhaustive prompt evaluation and subset selection for optimization still incurs substantial computational overhead due to frequent LLM inference calls. Distinguished from these direct evaluate-then-select schemes, this work investigates iterative approximate evaluation for arbitrary prompts and introduces Model Predictive Prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Intelligent Tutoring Systems and Adaptive Learning · Bayesian Modeling and Causal Inference
