Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng; Yang Zhou; Brian R. Bartoldson; Bhavya Kailkhura; Fan Lai; Jiawei Zhao; Beidi Chen

arXiv:2506.02177·cs.AI·June 4, 2025

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng, Yang Zhou, Brian R. Bartoldson, Bhavya Kailkhura, Fan Lai, Jiawei Zhao, Beidi Chen

PDF

Open Access 1 Models 1 Video

TL;DR

This paper introduces GRESO, an efficient reinforcement learning method for LLM reasoning that skips uninformative prompts based on reward dynamics, significantly reducing computational costs while maintaining performance.

Contribution

The paper proposes GRESO, a lightweight online filtering algorithm that predicts and skips uninformative prompts, improving RL training efficiency for LLM reasoning tasks.

Findings

01

GRESO achieves up to 2.4x speedup in rollout time

02

GRESO reduces total training time by up to 2.0x

03

Performance accuracy remains unaffected by GRESO

Abstract

Reinforcement learning, such as PPO and GRPO, has powered recent breakthroughs in LLM reasoning. Scaling rollout to sample more prompts enables models to selectively use higher-quality data for training, which can stabilize RL training and improve model performance. However, this comes at the cost of significant computational overhead. In this paper, we show that a substantial portion of this overhead can be avoided by skipping uninformative prompts before rollout. Our analysis of reward dynamics reveals a strong temporal consistency in prompt value: prompts that are uninformative in one epoch of training are likely to remain uninformative in future epochs. Based on these insights, we propose GRESO (GRPO with Efficient Selective Rollout), an online, lightweight pre-rollout filtering algorithm that predicts and skips uninformative prompts using reward training dynamics. By evaluating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Mirnegg/r1_qwen_1_5b_limo_sft_cleaned_ep-3
model

Videos

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts· slideslive

Taxonomy

TopicsArtificial Intelligence in Law · Multi-Agent Systems and Negotiation