Prompt replay: speeding up grpo with on-policy reuse of high-signal prompts

Andrei Baroian; Rutger Berger

arXiv:2603.21177·cs.LG·March 24, 2026

Prompt replay: speeding up grpo with on-policy reuse of high-signal prompts

Andrei Baroian, Rutger Berger

PDF

Open Access

TL;DR

Prompt Replay is an efficient online data selection method for GRPO that reuses prompts to accelerate learning, especially in difficult datasets, by focusing on prompts with medium difficulty to maximize learning signals.

Contribution

It introduces Prompt Replay, a novel prompt reuse strategy that improves training efficiency and effectiveness in GRPO by selectively reusing prompts based on their difficulty and pass rate.

Findings

01

Reduces zero-variance prompts and increases advantage.

02

Accelerates initial accuracy gains in multiple models.

03

Plateaus at convergence similar to baseline methods.

Abstract

Reinforcement learning with verifiable rewards (RLVR) plays a crucial role in expanding the capacities of LLM reasoning, but GRPO-style training is dominated by expensive rollouts and wastes compute on unusable prompts. We propose Prompt Replay, an overhead-free online data selection method for GRPO that reuses prompts only (not trajectories), to preserve on-policy optimization. After each step, we insert prompts with medium difficulty into a buffer, and prioritize prompts closer to a pass rate of 0.5 (half answers correct, half wrong) to maximize the advantage, thus learning signal. Training batches are formed by mixing reused prompts with fresh samples, with cooldown steps and max reuse times controlling aggressiveness vs risk of overfitting. Across multiple model families (Llama-3.2- 3B, Qwen3-8B) and training datasets (Dolci, Polaris), evaluated using average accuracy on six…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications