CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling

Zekai Qu; Yinxu Pan; Ao Sun; Chaojun Xiao; Xu Han

arXiv:2511.05589·cs.LG·November 11, 2025

CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling

Zekai Qu, Yinxu Pan, Ao Sun, Chaojun Xiao, Xu Han

PDF

Open Access

TL;DR

CoPRIS introduces a concurrency-controlled partial rollout method with importance sampling for reinforcement learning in large language models, significantly improving training efficiency while maintaining performance.

Contribution

It proposes a novel asynchronous RL framework with importance sampling correction, reducing training time and GPU idle time in LLM reinforcement learning.

Findings

01

Achieves up to 1.94x faster training

02

Maintains comparable or better performance

03

Effective in mathematical reasoning benchmarks

Abstract

Reinforcement learning (RL) post-training has become a trending paradigm for enhancing the capabilities of large language models (LLMs). Most existing RL systems for LLMs operate in a fully synchronous manner, where training must wait for the rollout of an entire batch to complete. This design leads to severe inefficiencies, as extremely long trajectories can stall the entire rollout process and leave many GPUs idle. To address this issue, we propose Concurrency- Controlled Partial Rollout with Importance Sampling (CoPRIS), which mitigates long-tail inefficiencies by maintaining a fixed number of concurrent rollouts, early-terminating once sufficient samples are collected, and reusing unfinished trajectories in subsequent rollouts. To mitigate the impact of off-policy trajectories, we introduce Cross-stage Importance Sampling Correction, which concatenates buffered log probabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications